如何在R中使用tidyr将一个字符串列分成多个其他列

Question

如何在R中使用tidyr将一个字符串列分成多个其他列

3

我正在使用R中的tidyr，并尝试将下面附加的“pub_author”列中的数据分成3个单独的列：“website_title”，“year”和“author”。我尝试使用“separate（）”函数进行分离，方法是 separate('pub_author'，c（'website_title'，'year'，'author'），' - '），但由于R会逐个读取每个“-”，因此它只返回前三个单词。有人知道如何将标题和作者的单词分组，以便它们出现在适当的列中或者有其他方法吗？

- George Coumantaros

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

使用separate可以传递正则表达式前后环视。在这种情况下，它将匹配在4位数字之前的-或在4位数字之后的-

library(tidyr)
separate(df1, pub_author, into = c('website_title','year', 'author'), 
     "-(?=\\d{4})|(?<=\\d{4})-")
#        website_title year        author
#1       nfl-draft-geek 2018 justin-miller
#2                  cbs 2019   pete-prisco
#3            sb-nation 2020     dan-kadar
#4    football-fan-spot 2019 steven-lourie
#5             fanspeak 2018       william
#6 acme-packing-company 2020  shawn-wagner

数据

df1 <- structure(list(pub_author = c("nfl-draft-geek-2018-justin-miller", 
"cbs-2019-pete-prisco", "sb-nation-2020-dan-kadar", 
  "football-fan-spot-2019-steven-lourie", 
"fanspeak-2018-william", "acme-packing-company-2020-shawn-wagner"
)), class = "data.frame", row.names = c(NA, -6L))