提取第一个单词

Question

提取第一个单词

4

I have the following data frame dat :

brand (column name)
Channel clothes
Gucci perfume
Channel shoes
LV purses
LV scarves

我希望创建一个新列，只包含品牌名称，即无论后面跟着什么，都是列"brand"的第一个单词。我希望输出如下:

brand (column name)
Channel
Gucci
Channel
LV
LV

我尝试使用下面的代码来调用子程序，但它无法工作。请问我的代码有什么问题？

brand <- sub("(\\w+).*", "\\1", dat$brand)

- lbrrrr

4

使用 sub(" .*$", "", dat$brand) 去除第一个空格后的所有内容。 - lmo

?stringi::stri_extract_first_words - hrbrmstr

3个回答

3

这应该可以做到。

dat <- data.frame(Brand = c('Channel clothes',
                           'Gucci perfume',
                           'Channel shoes',
                           'LV purses',
                           'LV scarves'))
brand <- sub('(^\\w+)\\s.+','\\1',dat$Brand)
#[1] "Channel" "Gucci"   "Channel" "LV"      "LV"

- Balter

1

我更喜欢使用 tidyverse 方法。

使用此数据集：

library(tidyverse)

df <- tribble(
  ~brand,
  "Channel clothes",
  "Gucci perfume",
  "Channel shoes",
  "LV purses",
  "LV scarves"
)

我们可以使用以下方法分隔列：

df %>% 
  separate(brand, into = c("brand", "item"), sep = " ")

返回：

# A tibble: 5 x 2
    brand    item
*   <chr>   <chr>
1 Channel clothes
2   Gucci perfume
3 Channel   shoes
4      LV  purses
5      LV scarves

- tyluRp

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以使用来自 stringr 的 word

library(stringr)
word(df$brand, 1)
#[1] "Channel" "Gucci"   "Channel" "LV"      "LV"