我有一个数据集,其中包含姓名、日期和几个分类列。假设:
data <- data.table(name = c('Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Ben', 'Ben', 'Cal'),
period = c(1,1,1,1,1,1,2,2,2,3,3),
category = c("A","A","A","B","B","B","A","B","A","B","A"))
这看起来像这样:
name period category
Anne 1 A
Ben 1 A
Cal 1 A
Anne 1 B
Ben 1 B
Cal 1 B
Anne 2 A
Ben 2 B
Ben 2 A
Ben 3 A
Cal 3 B
我希望能够计算每个时期每个分类变量组中过去时期有多少个名称。输出应按以下方式进行:
period category recurrence_count
2 A 2 # due to Anne and Ben being on A, period 1
2 B 1 # due to Ben being on B, period 1
3 A 1 # due to Ben being on A, period 2
3 B 0 # no match from B, period 2
我知道 data.table 中的 .I 和 .GRP 运算符,但我不知道如何在语句的 j 条目中编写“下一个组”的概念。我想像这样做可能是一个合理的路径,但我无法想出正确的语法:
data[, .(recurrence_count = length(intersect(name, name[last(.GRP)]))), by = .(category, period)]