基于正则表达式拆分 data.table 中的列

3

我有一个 data.table,其中有三列。我想根据正则表达式拆分第二列,以便最终得到四列。但是当我这样做时,我总是得到奇怪的响应,我希望得到一些反馈。以下是数据的预览:

     category                 label     count
1  Navigation     Product || Green         2 
2  Navigation      Survey || Green         5
3  Navigation       Product || Red        10
4  Navigation        Survey || Red        10

我想在标签部分的||处拆分,并创建两个新列TypeColor


1
tidyr::separate - M--
3个回答

4

使用 data.table,您可以执行以下操作:

dt[, c("type", "color") := tstrsplit(label, " || ", fixed = TRUE)]

     category            label count    type color
1: Nagivation Product || Green     2 Product Green
2: Navigation  Survey || Green     5  Survey Green

样例数据:

dt <- data.table(category = c("Nagivation", "Navigation"),
                 label = c("Product || Green", "Survey || Green"),
                 count = c(2, 5))

3
我们可以使用tidyr::separate
library(data.table)

dt1 <- fread("category     label            count
              Navigation   Product || Green     2
              Navigation   Survey || Green      5
              Navigation   Product || Red      10
              Navigation   Survey || Red       10")

tidyr::separate(dt1, label, sep = "\\|\\|", into = c("Type","Color"))

#>      category    Type   Color count
#> 1: Navigation Product   Green     2
#> 2: Navigation  Survey   Green     5
#> 3: Navigation Product     Red    10
#> 4: Navigation  Survey     Red    10

1
这是最简单和有效的利用方式。我标记了它,因为它非常容易使用。 - JAB

1
cbind(d, setNames(data.frame(do.call(rbind, strsplit(d$label, " || ", fixed = TRUE))),
         c("Type", "Color")))
#    category            label count     Type  Color
#1 Navigation Product || Green     2 Product   Green
#2 Navigation  Survey || Green     5  Survey   Green
#3 Navigation   Product || Red    10 Product     Red
#4 Navigation    Survey || Red    10  Survey     Red

数据

d = structure(list(category = c("Navigation", "Navigation", "Navigation", 
"Navigation"), label = c("Product || Green", "Survey || Green", 
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接