我可以通过多次循环数据集来完成此操作,但认为应该有一种更有效的方法来通过data.table完成此操作。
数据集如下所示:
CaseID Won OwnerID Time_period Finished
1 yes A 1 no
1 yes A 3 no
1 yes A 5 yes
2 no A 4 no
2 no A 6 yes
3 yes A 2 yes
4 no A 3 yes
5 15 B 2 no
针对每个所有者的行,我想生成一个在该时间段之前完成并赢得的案件数量的平均数。
CaseID Won OwnerID Time_period Finished AvgWonByOwner
1 yes A 1 no NA
1 yes A 3 no 1
1 yes A 5 yes .5
2 no A 4 no .5
2 no A 6 yes 2/3
3 yes A 2 yes NA
4 no A 3 yes 1
5 15 B 2 no NA
仔细看来,这似乎非常复杂。我以为可以通过某种滚动合并来完成,但我不知道如何设置条件,使得只有在日期之前赢得的平均值才会被计算,并且必须具有相同的所有者ID。
编辑1:最后一列数字的解释
AvgWonByOwner Explanation
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
.5 t = 5, case 3 finished, won; case 4 finished lost; average = .5
.5 t = 4, case 3 finished, won; case 4 finished lost; average = .5
2/3 t = 6, case 3 finished, won, case 4 finished lost, case 1 finished won, average: 2/3
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
NA t = 1, No cases finished yet, this could be 0 too