我有一个包含距离的数据表。我想通过我的"id"变量和包括的距离阈值(例如,Dist<1,Dist<2等)在数据表中运行各种操作。
我知道如何按id和距离运行操作"by=list(id,Dist)"
,但我真的希望有一个更像的by变量"by=list(id,c(Dist<=1,Dist<=2,Dist<=3,Dist<=4,Dist<=5))
。下面是我数据结构和目标的示例。
#load library
library(data.table)
#create data
set.seed(123L)
dt<-data.table(id=factor(rep(1:10,5)),V1=rnorm(50,5,5),Dist=sample(1:5,50,replace=T))
#calculate mean of V1 by id and distance (wrong results)
dt2<-dt[,.(MeanV1=mean(V1)),by=list(id,Dist)]
#calculate mean of V1 by id and conditional distance (right results, wrong method)
dt2.1<-dt[Dist<=1,.(MeanV1=mean(V1)),by=id]
dt2.2<-dt[Dist<=2,.(MeanV1=mean(V1)),by=id]
dt2.3<-dt[Dist<=3,.(MeanV1=mean(V1)),by=id]
dt2.4<-dt[Dist<=4,.(MeanV1=mean(V1)),by=id]
dt2.5<-dt[Dist<=5,.(MeanV1=mean(V1)),by=id]
dt2<-rbind(dt2.1,dt2.2,dt2.3,dt2.4,dt2.5)
#ideal methods if either were valid
#syntax 1
dt2<-dt[,.(MeanV1=mean(V1)),by=list(id,c(Dist<=1,Dist<=2,Dist<=3,Dist<=4,Dist<=5))]
#syntax 2
rowindices<-list(dt$Dist<=1,dt$Dist<=2,dt$Dist<=3,dt$Dist<=4,dt$Dist<=5)
dt2<-dt[,.(MeanV1=mean(V1)),by=list(id,rowindices)]
感谢您的提前预祝。
dt[.(cutid = 1:5, dcut = 1:5), on=.(Dist <= dcut), allow.cartesian=TRUE][, mean(V1), keyby=.(cutid, id)]
这个例子可以工作吗?不过,如果你正在计算平均值,有更有效的方法。 - Frank