基于字符串存在性创建新列

3

昨天我问了一个类似的问题,但今天我需要帮助用R进行操作。你可以在这里查看原始问题:Create new indicator columns based on values in another column

我的数据是这样的:

df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'))



我希望它看起来像这样:
goal_df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'), 
                      apple = c(1, 0, 1, 0), 
                      pear = c(0, 1, 0, 0), 
                      peach = c(0, 0, 1, 0))

head(goal_df)
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

我尝试了这个:
fruits <- list('apple', 'pear', 'peach')

for (i in fruits){
  df$i <- ifelse(str_detect(df$col, i), 1, 0)
}

                              col x
1                 I want an apple 0
2                    i hate pears 0
3 please buy a peach and an apple 1
4                   I want squash 0

有人能帮我看看我做错了什么吗?不确定为什么这只创建了一列。

4个回答

3

定义模式并将其与在outer中的col一起使用grepl组合。

pa <- c('apple', 'pear', 'peach')

data.frame(df, `colnames<-`(+t(outer(pa, df$col, Vectorize(grepl))), pa))
#                               col apple pear peach
# 1                 I want an apple     1    0     0
# 2                    i hate pears     0    1     0
# 3 please buy a peach and an apple     1    0     1
# 4                   I want squash     0    0     0

df <- structure(list(col = c("I want an apple", "i hate pears", "please buy a peach and an apple", 
"I want squash")), class = "data.frame", row.names = c(NA, -4L
))

3
你可以使用 rowwisemap 来创建列表列:
library(tidyverse)

names(fruits) <- fruits # makes new column names automatic

df %>% 
  rowwise() %>% 
  mutate(fruit_test = list(map_int(fruits, ~str_detect(col, .)))) %>% 
  unnest_wider(fruit_test)

# A tibble: 4 × 4
  col                             apple  pear peach
  <fct>                           <int> <int> <int>
1 I want an apple                     1     0     0
2 i hate pears                        0     1     0
3 please buy a peach and an apple     1     0     1
4 I want squash                       0     0     0

2

$更改为[[ - $

for (i in fruits){
   df[[i]] <- ifelse(str_detect(df$col, i), 1, 0)
 }

-输出

> df
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

OP得到的输出将使用 i 作为列名,而不是 x (可能存在拼写错误),因为$会创建 i 列,而不是 i 中的值,并且在每次迭代时更新,返回'fruits'中最后一个元素即'peach'的值。
> df
                              col i
1                 I want an apple 0
2                    i hate pears 0
3 please buy a peach and an apple 1
4                   I want squash 0

1
当然。我已经有一段时间没有写R了。那很傻。 - pkpto39

0
我们可以尝试以下基本的R选项。
u <- with(
  df,
  regmatches(
    col,
    gregexpr(
      do.call(paste, c(fruits, sep = "|")),
      col
    )
  )
)

cbind(df,unclass(t(table(stack(setNames(u, seq_along(u)))))))

这提供了

                              col apple peach pear
1                 I want an apple     1     0    0
2                    i hate pears     0     0    1
3 please buy a peach and an apple     1     1    0
4                   I want squash     0     0    0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接