我想知道自上次事件以来每个ID的天数。 数据如下:
df <- data.frame(date=as.Date(
c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001",
"06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), "%d/%m/%Y"),
event=c(0,0,1,0,1, 1,0,0,0,1),id = c(rep(1,5),rep(2,5)))
date event id
1 2000-07-06 0 1
2 2000-09-15 0 1
3 2000-10-15 1 1
4 2001-01-03 0 1
5 2001-03-17 1 1
6 2010-08-06 1 2
7 2010-09-15 0 2
8 2010-10-15 0 2
9 2011-01-03 0 2
10 2011-03-17 1 2
我在很大程度上借鉴了一个数据表解决方案(此处),但这并未考虑到ID。
library(data.table)
setDT(df)
setkey(df, date,id)
df = df[event == 1, .(lastevent = date), key = date][df, roll = TRUE]
df[, tae := difftime(lastevent, shift(lastevent, 1L, "lag"), unit = "days")]
df[event == 0, tae:= difftime(date, lastevent, unit = "days")]
它会生成以下输出。
date lastevent event id tae
1: 2000-07-06 <NA> 0 1 NA days
2: 2000-09-15 <NA> 0 1 NA days
3: 2000-10-15 2000-10-15 1 1 NA days
4: 2001-01-03 2000-10-15 0 1 80 days
5: 2001-03-17 2001-03-17 1 1 153 days
6: 2010-08-06 2010-08-06 1 2 3429 days
7: 2010-09-15 2010-08-06 0 2 40 days
8: 2010-10-15 2010-08-06 0 2 70 days
9: 2011-01-03 2010-08-06 0 2 150 days
10: 2011-03-17 2011-03-17 1 2 223 days
然而,我希望输出如下:
date lastevent event id tae
1: 2000-07-06 <NA> 0 1 NA days
2: 2000-09-15 <NA> 0 1 NA days
3: 2000-10-15 2000-10-15 1 1 NA days
4: 2001-01-03 2000-10-15 0 1 80 days
5: 2001-03-17 2001-03-17 1 1 153 days
6: 2010-08-06 2010-08-06 1 2 NA days
7: 2010-09-15 2010-08-06 0 2 40 days
8: 2010-10-15 2010-08-06 0 2 70 days
9: 2011-01-03 2010-08-06 0 2 150 days
10: 2011-03-17 2011-03-17 1 2 223 days
唯一的区别在于第6行和列tae中的NA。 这篇文章是一个相关的未回答的帖子。我已经在这里查找过,但解决方案在我的情况下不起作用。有很多类似的问题,但不是针对每个ID的计算。谢谢!
} - HOSS_JFL