使用spread函数在tidyr中创建两个值列

Question

使用spread函数在tidyr中创建两个值列

5

我有一个数据框，看起来就像这样（见链接）。我想要进一步处理下面生成的输出，通过将tone变量分布到n和average变量中。似乎这个主题可能与此相关，但我无法使其工作： Is it possible to use spread on multiple columns in tidyr similar to dcast? 我希望最终的表格中，源变量在一列中，然后是tone-n和tone-avg变量所在的列。因此，我希望列标题为“source” - “For - n” - “Against - n” “For -Avg” - “Against - Avg”。这是为了出版，而不是为了进一步计算，所以它关乎如何呈现数据。我认为以这种方式呈现数据更加直观。谢谢。

#variable1
Politician.For<-sample(seq(0,4,1),50, replace=TRUE)
#variable2
Politician.Against<-sample(seq(0,4,1),50, replace=TRUE)
#Variable3
Activist.For<-sample(seq(0,4,1),50,replace=TRUE)
#variable4
Activist.Against<-sample(seq(0,4,1),50,replace=TRUE)
#dataframe
df<-data.frame(Politician.For, Politician.Against, Activist.For,Activist.Against)

#tidyr
df %>%
 #Gather all columns 
 gather(df) %>%
 #separate by the period character 
 #(default separation character is non-alpha numeric characterr) 
 separate(col=df, into=c('source', 'tone')) %>%
 #group by both source and tone  
 group_by(source,tone) %>%
 #summarise to create counts and average
 summarise(n=sum(value), avg=mean(value)) %>%
 #try to spread
 spread(tone, c('n', 'value'))

- spindoctor

1

请展示所需的输出。 - Steven Beaupré

2个回答

1

使用 `data.table` 语法（感谢 @akrun）：

library(data.table)
dcast(
  setDT(melt(df))[,c('source', 'tone'):=
      tstrsplit(variable, '[.]')
    ][,list(
      N  = sum(value),
      avg= mean(value))
    ,by=.(source, tone)],
  source~tone,
  value.var=c('N','avg'))

- Frank

1

@akrun 好的，谢谢你的提示！如果 dplyr 的计划不包括 dcast 和 melt（因为它们起源于 hadleyverse），那么似乎有点奇怪（和限制性）。 - Frank

1

可能tidyr是用来做某些事情的。我猜测下一个版本的reshape2（？reshape3）会纠正这些问题。 - akrun

我更喜欢使用 melt(setDT(df))，但是有一个友好的警告信息可能会吓到人们，所以我们不需要加载 reshape2。 - akrun

@akrun 随意编辑；我对那些东西还不太了解。 - Frank

@Arun 随意编辑，删除我的1.9.4修补或其他任何内容。 - Frank

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user295691 · Accepted Answer

我认为您想要的是另一个gather函数，将计数和平均值分别作为单独的观测值来拆分，可以使用如下代码：gather(type, val, -source, -tone)

gather(df, who, value) %>%
    separate(who, into=c('source', 'tone')) %>%
    group_by(source, tone) %>%
    summarise(n=sum(value), avg=mean(value)) %>%
    gather(type, val, -source, -tone) %>%
    unite(stat, c(tone, type)) %>%
    spread(stat, val)

收益率

Source: local data frame [2 x 5]

      source Against_avg Against_n For_avg For_n
1   Activist        1.82        91    1.84    92
2 Politician        1.94        97    1.70    85