dplyr 0.3.0.9000如何正确使用do()函数

6

我尝试复制一个来自stackoverflow的问题的结果: dplyr: How to apply do() on result of group_by?

这是数据

person = c('Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob')
foods = c('apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana')
eaten <- data.frame(person, foods, stringsAsFactors = FALSE)

我试图复制的结果是:

[[1]]
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

[[2]]
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana" 

上述结果的原始代码如下,但已不再适用:

> eaten %>% group_by(person) %>% do(function(x) combn(x$foods, m = 2))
Error: Results are not data frames at positions: 1, 2

尝试了多种方法,但都无法使用do()函数。

> eaten %>% group_by(person) %>% do(combn(.$foods, m = 2))
Error: Results are not data frames at positions: 1, 2

> eaten %>% group_by(person) %>% do(.$foods, combn, m =2)
Error: Arguments to do() must either be all named or all unnamed

> eaten %>% group_by(person) %>% do((combn(.$foods, m=2)))
Error: Results are not data frames at positions: 1, 2

似乎只有下面这个方法能够工作,但会出现警告信息:

> eaten %>% group_by(person) %>% do(as.data.frame(combn(.$foods, m = 2)))
#   person        V1        V2       V3
# 1  Grace     apple     apple   banana
# 2  Grace    banana  cucumber cucumber
# 3    Rob spaghetti spaghetti cucumber
# 4    Rob  cucumber    banana   banana
# Warning messages:
# 1: In rbind_all(out[[1]]) : Unequal factor levels: coercing to character
# 2: In rbind_all(out[[1]]) : Unequal factor levels: coercing to character

相信在新版本下必须对do()的行为进行更改。有哪些改变?如何正确地使用do()?谢谢。

编辑:安装了最新的dplyr并运行@hadley建议的代码

packageVersion("dplyr")
[1]0.3.0.2’

eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
# Source: local data frame [2 x 2]
# Groups: <by row>
#   
#   person          x
# 1  Grace <chr[2,3]>
# 2    Rob <chr[2,3]>

编辑2:需要按照@hadley的建议提取“x”列

eaten2 <- eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
eaten2[["x"]]
# [[1]]
# [,1]     [,2]       [,3]      
# [1,] "apple"  "apple"    "banana"  
# [2,] "banana" "cucumber" "cucumber"
# 
# [[2]]
# [,1]        [,2]        [,3]      
# [1,] "spaghetti" "spaghetti" "cucumber"
# [2,] "cucumber"  "banana"    "banana" 

我只在dplyr 0.2中进行了测试,并收到了有关不平等因子水平的相同警告。为了摆脱这些问题(至少在0.2中),您可以修改您的“do”如下:do(as.data.frame(combn(.$foods, m = 2), stringsAsFactors = FALSE )) - 希望能帮到您。 - talat
那看起来很不符惯用语,有这样一个stringsAsFactors参数再次出现在do()函数中显得很奇怪。无论如何,我尝试解决了问题。然而,我希望学习一下使用do()的正确惯用语以及为什么会出现这种行为变化(或实际上没有变化)的原因。 - KFB
@hadley,它不起作用。 - KFB
@hadley,我已经将代码的结果发布到帖子中了。 - KFB
1
@KFB提取x列,你就能得到想要的内容。 - hadley
显示剩余2条评论
2个回答

2

将EDIT2移动到Q的答案以关闭问题:

对于最新的dplyr 0.3.0.2+,需要按照@hadley建议提取列“x”。

eaten2 <- eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
eaten2[["x"]]
# [[1]]
# [,1]     [,2]       [,3]      
# [1,] "apple"  "apple"    "banana"  
# [2,] "banana" "cucumber" "cucumber"
# 
# [[2]]
# [,1]        [,2]        [,3]      
# [1,] "spaghetti" "spaghetti" "cucumber"
# [2,] "cucumber"  "banana"    "banana

使用 magrittr 1.5,您还可以执行 eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2)) %$% x - talat
@docendodiscimus,谢谢你的想法! - KFB

0

显然这是一个偏好/数据用途的问题,但我认为上述可能性之一非常聪明,可以生成一个可用的、整洁的数据框。使用 tidyr::gather,我觉得这返回了一个对象,清楚地表明了谁在哪顿饭吃了什么,而不需要提取任何东西。

person = c( 'Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob' )
foods   = c( 'apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana' )
eaten <- data.frame(person, foods, stringsAsFactors = FALSE)
eaten %>% group_by(person) %>% do(as.data.frame(combn(.$foods, m = 2))) %>% gather(meal, foods, -1)

返回

# Groups:   person [2]
   person meal  foods    
   <chr>  <chr> <chr>    
 1 Grace  V1    apple    
 2 Grace  V1    banana   
 3 Rob    V1    spaghetti
 4 Rob    V1    cucumber 
 5 Grace  V2    apple    
 6 Grace  V2    cucumber 
 7 Rob    V2    spaghetti
 8 Rob    V2    banana   
 9 Grace  V3    banana   
10 Grace  V3    cucumber 
11 Rob    V3    cucumber 
12 Rob    V3    banana   
> 

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接