为什么在R中使用tidyr的`complete()`函数不能完成操作?

7
complete.test <- tibble(col1 = c("a", "a", "b", "b"),
                        col2 = c(as.Date("2019-01-01"),
                                 as.Date("2019-01-02"),
                                 as.Date("2019-01-03"),
                                 as.Date("2019-01-04")),
                        col3 = runif(4),
                        col4 = runif(4))
complete.test %>% complete(col1, col2)
#> # A tibble: 8 x 4
#>   col1  col2         col3   col4
#>   <chr> <date>      <dbl>  <dbl>
#> 1 a     2019-01-01  0.154  0.143
#> 2 a     2019-01-02  0.746  0.526
#> 3 a     2019-01-03 NA     NA    
#> 4 a     2019-01-04 NA     NA    
#> 5 b     2019-01-01 NA     NA    
#> 6 b     2019-01-02 NA     NA    
#> 7 b     2019-01-03  0.997  0.772
#> 8 b     2019-01-04  0.989  0.460

tidyr中的complete()函数在上述情况下像往常一样工作。但是,如果我们使用下面显示的特定数据集,则该函数将“停止”工作。可能是用户错误。请继续阅读。

library(tidyverse)
df <- structure(list(`Business Group` = c("ABC", "ABC", "ABC", 
"ABC", "ABC", "ABC", "ABC", "ABC", "ABC", 
"ABC", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "GHI", "GHI", 
"GHI", "GHI", "GHI", 
"GHI", "GHI", "GHI", 
"GHI", "GHI", "GHI", 
"GHI"), Month = structure(c(17866, 17897, 17928, 
17956, 17987, 18017, 18048, 18078, 18109, 18140, 17956, 17987, 
18017, 18048, 18078, 18109, 18140, 18170, 13970, 14000, 14031, 
14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304
), class = "Date"), SumChange = c(0, 0, 0, 1, 1, 1, 1, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 2, 0, -12, 3, 4, 3, 4, 
3), `Qty Items Open 90 Days` = c(0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 
0, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 2, 2, -10, -7, -3, 0, 4, 
7)), row.names = c(NA, -30L), groups = structure(list(`Business Group` = c("ABC", 
"DEF", "GHI"), .rows = list(1:10, 11:18, 19:30)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"), .drop = FALSE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

#> # A tibble: 30 x 4
#> # Groups:   Business Group [3]
#>    `Business Group` Month      SumChange `Qty Items Open 90 Days`
#>    <chr>            <date>         <dbl>                    <dbl>
#>  1 ABC              2018-12-01         0                        0
#>  2 ABC              2019-01-01         0                        0
#>  3 ABC              2019-02-01         0                        0
#>  4 ABC              2019-03-01         1                        1
#>  5 ABC              2019-04-01         1                        2
#>  6 ABC              2019-05-01         1                        3
#>  7 ABC              2019-06-01         1                        4
#>  8 ABC              2019-07-01         1                        5
#>  9 ABC              2019-08-01         0                        5
#> 10 ABC              2019-09-01         0                        5
#> 11 DEF              2019-03-01         0                        0
#> 12 DEF              2019-04-01         0                        0
#> 13 DEF              2019-05-01         0                        0
#> 14 DEF              2019-06-01         0                        0
#> 15 DEF              2019-07-01         0                        0
#> 16 DEF              2019-08-01         0                        0
#> 17 DEF              2019-09-01         0                        0
#> 18 DEF              2019-10-01        -3                       -3
#> 19 GHI              2008-04-01         0                        0
#> 20 GHI              2008-05-01         0                        0
#> 21 GHI              2008-06-01         0                        0
#> 22 GHI              2008-07-01         0                        0
#> 23 GHI              2008-08-01         2                        2
#> 24 GHI              2008-09-01         0                        2
#> 25 GHI              2008-10-01       -12                      -10
#> 26 GHI              2008-11-01         3                       -7
#> 27 GHI              2008-12-01         4                       -3
#> 28 GHI              2009-01-01         3                        0
#> 29 GHI              2009-02-01         4                        4
#> 30 GHI              2009-03-01         3                        7

您可以看到上面的数据框由三个组“ABC”,“DEF”和“GHI”组成。我知道complete(`Business Group`,Month)函数的行为不像我期望的那样,因为它没有针对缺失的Business GroupMonth的数据组合来完成数据框。 "GHI"业务组拥有自2009年以来的日期,但未针对“ABC”和“DEF”组完整。此外,没有完成任何内容。您有什么想法是错了吗?
df %>% complete(`Business Group`, Month) 
#> # A tibble: 30 x 4
#> # Groups:   Business Group [3]
#>    `Business Group` Month      SumChange `Qty Items Open 90 Days`
#>    <chr>            <date>         <dbl>                    <dbl>
#>  1 ABC              2018-12-01         0                        0
#>  2 ABC              2019-01-01         0                        0
#>  3 ABC              2019-02-01         0                        0
#>  4 ABC              2019-03-01         1                        1
#>  5 ABC              2019-04-01         1                        2
#>  6 ABC              2019-05-01         1                        3
#>  7 ABC              2019-06-01         1                        4
#>  8 ABC              2019-07-01         1                        5
#>  9 ABC              2019-08-01         0                        5
#> 10 ABC              2019-09-01         0                        5
#> 11 DEF              2019-03-01         0                        0
#> 12 DEF              2019-04-01         0                        0
#> 13 DEF              2019-05-01         0                        0
#> 14 DEF              2019-06-01         0                        0
#> 15 DEF              2019-07-01         0                        0
#> 16 DEF              2019-08-01         0                        0
#> 17 DEF              2019-09-01         0                        0
#> 18 DEF              2019-10-01        -3                       -3
#> 19 GHI              2008-04-01         0                        0
#> 20 GHI              2008-05-01         0                        0
#> 21 GHI              2008-06-01         0                        0
#> 22 GHI              2008-07-01         0                        0
#> 23 GHI              2008-08-01         2                        2
#> 24 GHI              2008-09-01         0                        2
#> 25 GHI              2008-10-01       -12                      -10
#> 26 GHI              2008-11-01         3                       -7
#> 27 GHI              2008-12-01         4                       -3
#> 28 GHI              2009-01-01         3                        0
#> 29 GHI              2009-02-01         4                        4
#> 30 GHI              2009-03-01         3                        7

3
虽然不是直接相关的,但“complete”函数来自于“tidyr”,而非“dplyr”。将tidyverse作为一组包加载时,其一个问题在于它掩盖了函数实际来源的位置。 - camille
1个回答

12

这是一个分组的tbl_df

library(dplyr)
library(tidyr)
df %>% 
   group_vars()
 #[1] "Business Group"

ungroup命令应该能正常工作。

df %>% 
   ungroup %>% 
   complete(`Business Group`, Month)
# A tibble: 69 x 4
#   `Business Group` Month      SumChange `Qty Items Open 90 Days`
#   <chr>            <date>         <dbl>                    <dbl>
# 1 ABC              2008-04-01        NA                       NA
# 2 ABC              2008-05-01        NA                       NA
# 3 ABC              2008-06-01        NA                       NA
# 4 ABC              2008-07-01        NA                       NA
# 5 ABC              2008-08-01        NA                       NA
# 6 ABC              2008-09-01        NA                       NA
# 7 ABC              2008-10-01        NA                       NA
# 8 ABC              2008-11-01        NA                       NA
# 9 ABC              2008-12-01        NA                       NA
#10 ABC              2009-01-01        NA                       NA
# … with 59 more rows

1
group_by() 是如何应用到我的代码中的?它是否被强制执行了? - Display name
2
@JasonHunter 这可能是由于您对输入数据执行的某些先前操作,例如 df <- df %>% group_by(Business Group) %>% mutate(rn = row_number()),如果您不对其进行 ungroup,则会保留分组属性。 - akrun
2
我可能忘了,将元素分组后是否总是最佳实践要进行 ungroup() 操作? - Display name
3
视情况而定。如果您想利用已有的分组信息进行进一步的总结/变异,您可以使用该组属性;否则,我建议取消分组。 - akrun

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接