我有这个数据集
# A tibble: 268 x 1
`Which of these social media platforms do you have an account in right now?`
<chr>
1 Facebook, Instagram, Twitter, Snapchat, Reddit, Signal
2 Reddit
3 Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora
4 Facebook, Instagram, Twitter, Snapchat
5 Facebook, Instagram, TikTok, Snapchat
6 Facebook, Instagram, Twitter, Linkedin, Snapchat
7 Facebook, Instagram, TikTok, Linkedin, Snapchat, Reddit
8 Facebook, Instagram, Snapchat
9 Linkedin, Reddit
10 Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
我希望将此内容拆分为多个列,并在每个变量上标记“是”和“否”,如下所示:
# A tibble: 268 x 8
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes
因此,我编写了这段代码来实现这个目的。
library(tidyverse)
library(tidytext)
Survey %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Network, `Which of these social media platforms do you have an account in right now?`, to_lower = F) %>%
spread(Network, HasAccount, fill = "No")
但是我遇到了这个错误
Erreur : Must extract column with a single valid subscript.
x Subscript `var` has size 268 but must be size 1.
> dput(head(Survey[1:5]))
structure(list(Horodateur = structure(c(1619171956.596, 1619172695.039,
1619173104.83, 1619174548.534, 1619174557.538, 1619174735.457
), tzone = "UTC", class = c("POSIXct", "POSIXt")), `To_which_gender_you_identify_the_most?` = c("Male",
"Female", "Male", "Female", "Female", "Female"), What_is_your_age_group = c("[18-24[",
"[10,18[", "[18-24[", "[18-24[", "[18-24[", "[25,34["), How_much_time_do_you_spend_on_social_media = c("1-5 hours",
"1-5 hours", ">10 hours", "5-10 hours", "5-10 hours", "1-5 hours"
), `Which_of_these_social_media_platforms_do_you_have_an_account_in_right_now?` = c("Facebook, Instagram, Twitter, Snapchat, Reddit, Signal",
"Reddit", "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora",
"Facebook, Instagram, Twitter, Snapchat", "Facebook, Instagram, TikTok, Snapchat",
"Facebook, Instagram, Twitter, Linkedin, Snapchat")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
编辑:基于@CSJCampbell的答案编辑了问题。 编辑:添加了我正在使用的数据集片段。
packageVersion('dplyr')
,packageVersion('tidytext')
并检查是否有任何被屏蔽的函数。 - akrundplyr
和tidytext
包的版本号。dplyr
当前版本为1.0.6,tidytext
当前版本为0.3.1。 @akrun,执行以上命令可以查看 - wageehdf
不同的名称。 - akrun