基于字符串存在性创建新列

Question

基于字符串存在性创建新列

3

昨天我问了一个类似的问题，但今天我需要帮助用R进行操作。你可以在这里查看原始问题：Create new indicator columns based on values in another column

我的数据是这样的：

df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'))

我希望它看起来像这样：

goal_df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'), 
                      apple = c(1, 0, 1, 0), 
                      pear = c(0, 1, 0, 0), 
                      peach = c(0, 0, 1, 0))

head(goal_df)
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

我尝试了这个：

fruits <- list('apple', 'pear', 'peach')

for (i in fruits){
  df$i <- ifelse(str_detect(df$col, i), 1, 0)
}

                              col x
1                 I want an apple 0
2                    i hate pears 0
3 please buy a peach and an apple 1
4                   I want squash 0

有人能帮我看看我做错了什么吗？不确定为什么这只创建了一列。

- pkpto39

4个回答

3

你可以使用 rowwise 和 map 来创建列表列：

library(tidyverse)

names(fruits) <- fruits # makes new column names automatic

df %>% 
  rowwise() %>% 
  mutate(fruit_test = list(map_int(fruits, ~str_detect(col, .)))) %>% 
  unnest_wider(fruit_test)

# A tibble: 4 × 4
  col                             apple  pear peach
  <fct>                           <int> <int> <int>
1 I want an apple                     1     0     0
2 i hate pears                        0     1     0
3 please buy a peach and an apple     1     0     1
4 I want squash                       0     0     0

- andrew_reece

2

将$更改为[[ - $

for (i in fruits){
   df[[i]] <- ifelse(str_detect(df$col, i), 1, 0)
 }

-输出

> df
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

OP得到的输出将使用 i 作为列名，而不是 x （可能存在拼写错误），因为$会创建 i 列，而不是 i 中的值，并且在每次迭代时更新，返回'fruits'中最后一个元素即'peach'的值。 > df col i 1 I want an apple 0 2 i hate pears 0 3 please buy a peach and an apple 1 4 I want squash 0

- akrun

1

当然。我已经有一段时间没有写R了。那很傻。 - pkpto39

0

我们可以尝试以下基本的R选项。

u <- with(
  df,
  regmatches(
    col,
    gregexpr(
      do.call(paste, c(fruits, sep = "|")),
      col
    )
  )
)

cbind(df,unclass(t(table(stack(setNames(u, seq_along(u)))))))

这提供了

                              col apple peach pear
1                 I want an apple     1     0    0
2                    i hate pears     0     0    1
3 please buy a peach and an apple     1     1    0
4                   I want squash     0     0    0

- ThomasIsCoding

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jay.sf · Accepted Answer

定义模式并将其与在outer中的col一起使用grepl组合。

pa <- c('apple', 'pear', 'peach')

data.frame(df, `colnames<-`(+t(outer(pa, df$col, Vectorize(grepl))), pa))
#                               col apple pear peach
# 1                 I want an apple     1    0     0
# 2                    i hate pears     0    1     0
# 3 please buy a peach and an apple     1    0     1
# 4                   I want squash     0    0     0

df <- structure(list(col = c("I want an apple", "i hate pears", "please buy a peach and an apple", 
"I want squash")), class = "data.frame", row.names = c(NA, -4L
))