无法删除列 - 使用dplyr的select()函数

15

我正在使用dplyr,有一个分组的数据框。我尝试使用select函数在这个分组数据框中删除一列,但是出现了错误信息。

> tbl %>% select(-names)
Error: corrupt 'grouped_df', contains 42 rows, and 965 rows in groups

我的数据如下。

> print(tbl_df(tbl), n = 1000)
Source: local data frame [42 x 15]

                     household                                       names x2003 x2004 x2005 x2006 x2007  x2008  x2009  x2012 last.avail last.avail.year absChange.last annChange.last           translation
                         (chr)                                      (fctr) (int) (int) (int) (int) (int)  (int)  (int)  (int)      (int)           (dbl)          (int)          (dbl)                (fctr)
1               all households                                      bostad 59280 61850 62760 63210 66950  73340  72350  77750      77750            2012          18470    0.030594980          Accomodation
2               all households                           fritid och kultur 45140 46140 49260 48640 49720  55120  53970  61170      61170            2012          16030    0.034341864   Leisure and culture
3               all households                                   transport 41930 40430 45870 48850 47280  50250  42650  49940      49940            2012           8010    0.019614408        Transportation
4               all households                             köpta livsmedel 28420 30000 29130 30420 30750  34130  34780  34570      34570            2012           6150    0.022004509      Bought Groceries
5               all households hyra/avgift för hyres-/borätt (inkl garage) 27310 27720 28860 30000 28990  29660  30740     NA      30740            2009           3430    0.019914330 Rent for accomodation
6               all households                            hushållstjänster 11360 12030 13200 12390  8520  10250  13530  22900      22900            2012          11540    0.081007165    Household services
7           cohabit with child                                      bostad 78240 83040 81390 79180 90490  95630 100060 100980     100980            2012          22740    0.028754709          Accomodation
8           cohabit with child                           fritid och kultur 67110 67640 67290 64600 74290  71890  77200  81180      81180            2012          14070    0.021373640   Leisure and culture
9           cohabit with child                                   transport 58350 62440 70010 69560 68730  75290  65510  71340      71340            2012          12990    0.022584342        Transportation
10          cohabit with child                             köpta livsmedel 45190 45660 45720 44980 48250  52880  52770  52710      52710            2012           7520    0.017250361      Bought Groceries
11          cohabit with child                            hushållstjänster 19840 21380 25690 21430 17190  19060  24730  37440      37440            2012          17600    0.073108900    Household services
12          cohabit with child                             räntor (brutto) 27090 25230 24390 24500 28510  36030  33080     NA      33080            2009           5990    0.033854485           Rents (net)
13       cohabit without child                                      bostad 60340 63230 63560 61760 67100  74160  70440  78510      78510            2012          18170    0.029679783          Accomodation
14       cohabit without child                           fritid och kultur 51120 48780 57700 57320 57620  67220  62460  68400      68400            2012          17280    0.032884345   Leisure and culture
15       cohabit without child                                   transport 49740 46310 55580 57730 56770  54910  52720  59360      59360            2012           9620    0.019839931        Transportation
16       cohabit without child                             köpta livsmedel 31130 33700 31900 33000 33990  37330  37980  37090      37090            2012           5960    0.019654591      Bought Groceries
17       cohabit without child                                drift av bil 24370 21790 25170 27530 25140  28180  26650     NA      26650            2009           2280    0.015017696          Car expenses
18       cohabit without child                            hushållstjänster 11650 12400 12260 12310  8580  11920  13950  26370      26370            2012          14720    0.095016005    Household services
19    other cohabit with child                           fritid och kultur 67680 75550 78020 75800 88870  80070  84490 116020     116020            2012          48340    0.061715253   Leisure and culture
20    other cohabit with child                                      bostad 73850 68740 84800 86510 89290 106540  89650 100580     100580            2012          26730    0.034920030          Accomodation
21    other cohabit with child                                   transport 66950 79620 75730 77800 81010  93790  77960  98660      98660            2012          31710    0.044022982        Transportation
22    other cohabit with child                             köpta livsmedel 54070 53790 50680 51440 53720  64170  62050  63690      63690            2012           9620    0.018360752      Bought Groceries
23    other cohabit with child                                drift av bil 32690 34180 37530 36200 38280  38990  36390     NA      36390            2009           3700    0.018031437          Car expenses
24    other cohabit with child                            hushållstjänster 15690 21000 20810 20370  9990  11880  19710  32460      32460            2012          16770    0.084128145    Household services
25            other households                                      bostad 62860 68680 69950 72840 70700  91510  84480  86020      86020            2012          23160    0.035466655          Accomodation
26            other households                           fritid och kultur 49940 48530 55280 57970 54470  61130  65280  67920      67920            2012          17980    0.034758001   Leisure and culture
27            other households                                   transport 50590 41980 57370 64960 52780  61460  59770  59630      59630            2012           9040    0.018435074        Transportation
28            other households                             köpta livsmedel 35370 35210 35360 41560 35040  43770  45940  43270      43270            2012           7900    0.022652258      Bought Groceries
29            other households                                drift av bil 21440 21580 25640 30070 28260  30070  32010     NA      32010            2009          10570    0.069079862          Car expenses
30            other households hyra/avgift för hyres-/borätt (inkl garage) 29550 32320 25170 24600 29480  35290  25920     NA      25920            2009          -3630   -0.021607942 Rent for accomodation
31               single parent                                      bostad 67890 67250 71200 75210 71000  73490  74710  81820      81820            2012          13930    0.020953501          Accomodation
32               single parent                           fritid och kultur 34900 35860 43600 46770 43540  46160  45840  51000      51000            2012          16100    0.043049627   Leisure and culture
33               single parent hyra/avgift för hyres-/borätt (inkl garage) 43360 44020 45160 49430 45370  44090  48740     NA      48740            2009           5380    0.019685026 Rent for accomodation
34               single parent                                   transport 27230 30810 28810 28410 30500  30390  29360  34890      34890            2012           7660    0.027925124        Transportation
35               single parent                             köpta livsmedel 26420 27910 28160 29100 28310  33020  35910  33740      33740            2012           7320    0.027546212      Bought Groceries
36               single parent                            hushållstjänster  9490 11690 13770  8650  7250  10390  11490  17140      17140            2012           7650    0.067891620    Household services
37 single parent without child                                      bostad 45660 47110 48750 50850 51610  55720  56020  61090      61090            2012          15430    0.032876143          Accomodation
38 single parent without child                           fritid och kultur 28270 31890 31140 30210 28480  35650  32840  41770      41770            2012          13500    0.044329701   Leisure and culture
39 single parent without child hyra/avgift för hyres-/borätt (inkl garage) 31900 32160 33010 36300 34300  35330  37800     NA      37800            2009           5900    0.028687635 Rent for accomodation
40 single parent without child                                   transport 26730 22980 24530 29310 28440  31680  20150  28800      28800            2012           2070    0.008322088        Transportation
41 single parent without child                             köpta livsmedel 15330 16930 16150 17630 17280  18390  19370  19580      19580            2012           4250    0.027561531      Bought Groceries
42 single parent without child                            hushållstjänster  6570  6590  6840  7080  3780   4300   7000  12310      12310            2012           5740    0.072257733    Household services

问题是什么,如何解决?


6
尝试使用ungroup(),即tbl %>% ungroup() %>% select(-names)。该函数可以取消分组操作并选取除了names列之外的所有列。 - akrun
工作正常。这种行为背后的机制是什么?你知道我可以在哪里阅读更多关于这个的资料吗? - uncool
2
错误消息告诉你问题所在——你的对象已经损坏,可能是与 grouped_df 相关的属性。使用 ungroup 可以移除这些属性。另外这个未解决的 bug 也可能是一个线索:https://github.com/hadley/dplyr/issues/1385 如果不行的话,可以尝试提交一个新的 bug 报告。 - Frank
2
如果按该列分组,则需要取消分组。即使它没有损坏,select也无法在已分组的数据框中删除。 - akrun
当然,它说它已经损坏了。但是我不明白这应该是怎么发生的... mutate和select操作在dplyr中最初是针对分组数据工作的吗? - uncool
1
我们无法确定,因为这不是一个可重现的示例。mutate通常用于分组数据集。 - akrun
1个回答

23
如果要删除的变量用作分组变量,则需要在使用该变量进行选择之前进行ungroup操作。在当前的dplyr版本(dplyr_0.4.3)中是这种情况,但在未来的dplyr版本中可能会发生变化。
tbl %>% 
    ungroup() %>%
    select(-names)

作为损坏的分组数据的一个例子,假设我们试图从“df3”中删除列“y”。
dat3 %>% 
  select(-y)
#Error: corrupt 'grouped_df', contains 1100 rows, and 1000 rows in groups

通过检查str(dat3)

str(dat3)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1100 obs. of  2 variables:
# $ group: Factor w/ 3 levels "A","B","C": 2 3 2 2 2 2 1 2 2 1 ...
# $ y    : num  1.396 -0.892 1.065 0.801 -0.368 ...
# - attr(*, "vars")=List of 1
#  ..$ : symbol group
# - attr(*, "drop")= logi TRUE
# - attr(*, "indices")=List of 3
#  ..$ : int  6 9 12 13 14 16 18 21 25 27 ...
#  ..$ : int  0 2 3 4 5 7 8 10 11 15 ...
#  ..$ : int  1 17 24 28 35 37 39 43 47 49 ...
# - attr(*, "group_sizes")= int  323 365 312
# - attr(*, "biggest_group_size")= int 365
# - attr(*, "labels")='data.frame':      3 obs. of  1 variable:
#  ..$ group: Factor w/ 3 levels "A","B","C": 1 2 3
#  ..- attr(*, "vars")=List of 1
#  .. ..$ : symbol group
#  ..- attr(*, "drop")= logi TRUE

我们发现通过rbind添加的attr,但是如果我们使用bind_rows则不会。
dat4 <- bind_rows(dat1, dat2)
str(dat4)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       1100 obs. of  2 variables:
# $ group: chr  "B" "C" "B" "B" ...
# $ y    : num  1.396 -0.892 1.065 0.801 -0.368 ...

我们可以从'dat4'中删除'y'列。
 dat4 %>%
    select(-y)

由于OP没有展示如何创建'tbl',我们只能假设它是使用一些方法创建的,这些方法通过添加属性来破坏了数据集。

事情是,分组变量仅在“住房”列上。 - uncool
1
@uncool,你还没有展示获取tbl和带有损坏分组数据框的小数据集的dput代码。这将有助于我们更好地解释。一般来说,如果要删除分组变量,则需要取消分组。 - akrun
有时候,我们可能会从错误的包中使用 select 函数,请确保添加 dplyr::select(-names) - Jason Goal

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接