ddply错误：attributes(out) <- attributes(col)的长度必须与向量长度相同：'names'属性。

Question

ddply错误：attributes(out) <- attributes(col)的长度必须与向量长度相同：'names'属性。

3

我正在尝试在一个大数据框（38000行/10个变量）上应用ddply，但是遇到了错误：

ddply(uncertainty.long, .(Species), "nrow")

返回错误：

Error in attributes(out) <- attributes(col) : 
  'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   })(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")

关于我的数据框的更多细节：

    > head(uncertainty.long)
                Stack Variable PARun Model             Species    value year scenario   GCM                    sp
1        sync_current    Total   PA1   GLM Arctosafulvolineata 100.0000   NA     <NA>  <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 134.6840 2020      B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 153.7617 2050      B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080    Total   PA1   GLM Arctosafulvolineata 195.7176 2080      B2A cgcm2 Arctosa\nfulvolineata
5   sync_mk2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 172.2967 2020      B2A   mk2 Arctosa\nfulvolineata
6   sync_mk2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 198.9391 2050      B2A   mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame':   38000 obs. of  10 variables:
 $ Stack   : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
 $ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ PARun   : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Model   : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...
 $ value   : num  100 135 154 196 172 ...
 $ year    : num  NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
 $ scenario: chr  NA "B2A" "B2A" "B2A" ...
 $ GCM     : chr  NA "cgcm2" "cgcm2" "cgcm2" ...
 $ sp      : chr  "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...

这是我的sessionInfo()函数：

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
 [1] parallel  splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2      Hmisc_3.12-2        Formula_1.1-1       RCurl_1.95-4.1      bitops_1.0-6        biomod2_3.0.3       pROC_1.5.4          plyr_1.8           
 [9] rpart_4.1-3         randomForest_4.6-7  mda_0.4-4           class_7.3-9         gbm_2.1             survival_2.37-4     nnet_7.3-7          rasterVis_0.21     
[17] hexbin_1.26.2       latticeExtra_0.6-26 RColorBrewer_1.0-5  lattice_0.20-23     abind_1.4-0         raster_2.1-49       sp_1.0-13           ggplot2_0.9.3.1    

loaded via a namespace (and not attached):
 [1] cluster_1.14.4   colorspace_1.2-2 dichromat_2.0-0  digest_0.6.3     gtable_0.1.2     labeling_0.2     MASS_7.3-29      munsell_0.4.2    proto_0.3-10     scales_0.2.3    
[11] stringr_0.6.2    tools_3.0.1      zoo_1.7-10

我尝试使用更少的列（2列）来复制它，但没有改变任何东西。但是，如果我减少行数，在请求的变量“Species”只有一个级别值时，它可以正常工作：

> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis 
> ddply(small.df, .(Species), "nrow")
                  Species nrow
    1 Arctosafulvolineata 3800

但是如果我有另一行：

> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica 
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
                    Stack Variable PARun  Model             Species     value year scenario    GCM                    sp
3800 sync_hadcm3_A1B_2080     Lost   PA5 MAXENT Arctosafulvolineata -54.90872 2080      A1B hadcm3 Arctosa\nfulvolineata
3801         sync_current    Total   PA1    GLM  Argyronetaaquatica 100.00000   NA     <NA>   <NA>  Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) : 
  'names' attribute [3801] must be the same length as the vector [3800]

我发现其他人也有类似的问题：https://dev59.com/yGYq5IYBdhLWcg3w2EBi#14162351。

然而，他们的解决方法（重新安装plyr 1.7而不是1.8）对我没有起作用。有人知道问题和/或如何解决吗？

谢谢！

问题已解决问题出在“物种”列的“名称”属性上。我使用以下代码删除了它们，然后ddply就可以工作了：

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800

- Boris Leroy

你可以尝试以下两种方法：1. 添加 sessionInfo() 2. 获取一个最小工作示例。例如，如果你的数据集只有3列和5行，你能否重现错误？ - csgillespie

你能否使用dput替代/以及str和head吗？ - cianius

我的表的dput非常长，不幸的是，我认为我不能在这里放置它？str和head已经存在。 - Boris Leroy

很高兴你找到了问题所在！作为论坛礼仪的一部分，请将您的解决方案写成答案（而不是问题编辑）并接受它。这会使网站更加清洁，并显示您的问题已得到解决。 - Blue Magister

1

是的，谢谢，我会尽快这样做。然而，考虑到我是Stackoverflow的新手，我必须等待：在提问后8小时内，声望低于10的用户无法回答自己的问题。您可以在7小时内回答。在那之前，请使用评论或编辑您的问题。 - Boris Leroy

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Boris Leroy · Accepted Answer

问题出在“物种”列的“名称”属性上：

$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...

我用以下代码移除它们，然后ddply就可以工作了：

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800