我正在尝试通过三个变量(组、id和日期)对数据表进行交叉连接。下面的R代码正好实现了我想要做的事情,即在每个组内将每个ID扩展到包括所有所需的日期。但是,是否有一种更有效地使用优秀的data.table软件包实现相同功能的方法?
library(data.table)
data <- data.table(
group = c(rep("A", 10), rep("B", 10)),
id = c(rep("frank", 5), rep("tony", 5), rep("arthur", 5), rep("edward", 5)),
date = seq(as.IDate("2020-01-01"), as.IDate("2020-01-20"), by = "day")
)
data
dates_wanted <- seq(as.IDate("2020-01-01"), as.IDate("2020-01-31"), by = "day")
names_A <- data[group == "A"][["id"]]
names_B <- data[group == "B"][["id"]]
names_A <- CJ(group = "A", id = names_A, date = dates_wanted, unique = TRUE)
names_B <- CJ(group = "B", id = names_B, date = dates_wanted, unique = TRUE)
alldates <- rbind(names_A, names_B)
alldates
data[alldates, on = .(group, id, date)]
do.call
,只需使用data[, CJ(id, date = dates_wanted, unique = TRUE), group]
即可。这条语句会选择数据框中符合要求的行,并按照指定的组进行聚合。 - IceCreamToucan.SD
,然后意识到“date”被更改为“dates_wanted”。 - akrun