根据另一列中的字符串值创建新列

3

我有一个在 R 中的数据框,其中一个列是包含大字符串的列。我想要使用该字符串创建一个新列并添加特定值。

以下是一个示例数据框:

dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

现在,如果列“Banner”包含字符串“Watermelon”或“Vanilla”,那么新列“label”应该只有值“Watermelon”或“Vanilla”,否则为“Default”。下面是预期的数据框架。我该如何使用“grep”或其他任何方法来满足多个条件?
dom_output <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -bbb_300x250 v2"   , "notest_Orange aaa_300x250 v2"    , "bottle :15s","aaaa vvvv cccc 320x480"),
  label  = c("Watermelon","Vanilla","Default","Default")
)

4个回答

5
library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
#      Site                              Banner      label
# 1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
# 2    beta notest_Vanilla Latte-DPI_300x250 v2    Vanilla
# 3 charlie                         bottle :15s    Default
# 4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

0

这里有一个使用基本R的简单解决方案:

#Sample data:
dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)


dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
                    ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))

0
library(dplyr)
library(stringi)

dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
                                 stri_detect_fixed(Banner, "Vanilla")    ~ "Vanilla",
                                                                   TRUE  ~ "Default"))
#>      Site                              Banner          label
#> 1   alpha  testing_Watermelon -DPI_300x250 v2     Watermelon
#> 2    beta notest_Vanilla Latte-DPI_300x250 v2        Vanilla
#> 3 charlie                         bottle :15s        Default
#> 4   delta aaaa vvvv cccc Build_Mobile_320x480        Default

数据:

dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
                  Banner = c("testing_Watermelon -DPI_300x250 v2",
                             "notest_Vanilla Latte-DPI_300x250 v2",
                             "bottle :15s",
                             "aaaa vvvv cccc Build_Mobile_320x480"))

0

一个基础的R语言可能性是:

labels <- paste(c("Watermelon", "Orange"), collapse = "|")

dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
dom$label[is.na(dom$label)] <- "Default"

     Site                              Banner      label
1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
2    beta  notest_Orange Latte-DPI_300x250 v2     Orange
3 charlie                         bottle :15s    Default
4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

这个也可以被 dplyrtidyr 使用:

dom %>%
 mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
        label = replace_na(label, "Default"))

示例数据:

dom <- data.frame(
 Site = c("alpha", "beta", "charlie", "delta"),
 Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接