I have the following data frame in R:
c1 c2
1 10 a
2 20 a
3 30 b
4 40 b
然后我按以下方式进行split
:z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)})
。此时,z
如下:
$a
[1] (9.99,15] (15,20]
Levels: (9.99,15] (15,20]
$b
[1] (30,35] (35,40]
Levels: (30,35] (35,40]
我希望能将这些因素合并回来,通过取消列表的分裂 unsplit(z, test$c2)
。这会生成一个警告:
[1] (9.99,15] (15,20] <NA> <NA>
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
invalid factor level, NAs generated
我想将所有因子水平取并集,然后解除分裂,以避免出现此错误:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
在我的实际数据框中,我有一个非常大的列表,因此我需要迭代所有列表元素(不仅仅是两个)。做到这一点最好的方法是什么?
(test2 <- transform(test, newvar = unlist(lapply(split(c1, c2), cut, 2))))
[但你的代码更短一些] - Ben Bolkerfunction(x) {z <- cut(x,2); levels(z)<-c("bucket1", "bucket2")}
而不是使用 cut,但似乎不起作用。 谢谢! - Alex