这是对之前问题的跟进,但由于NA
的原因,我遇到了答案提供的问题:
require(data.table)
ID <- c(rep(1,4), rep(3, 5), rep(4,4),rep(5,5))
Begin <- c(0,2.5,NA,3,7,8,7,25,25,10,15,0,0,1,NA,10,11,13)
End <- c(1.5,3.5,NA,6,12,8,11,29,35, 12,19,NA,28,5,20,30,20,25)
df <- data.table(ID, Begin, End)
df[, Begin_New := {
high_so_far = shift(cummax(End), fill=Begin[1L])
w = which(Begin < high_so_far)
Begin[w] = high_so_far[w]
Begin
}, by=ID]
ID Begin End Begin_New
1: 1 0.0 1.5 0.0
2: 1 2.5 3.5 2.5
3: 1 NA NA NA
4: 1 3.0 6.0 3.0* # <~~ it supposed 3.5
5: 3 7.0 12.0 7.0
6: 3 8.0 8.0 12.0
7: 3 7.0 11.0 12.0
8: 3 25.0 29.0 25.0
9: 3 25.0 35.0 29.0
10: 4 10.0 12.0 10.0
11: 4 15.0 19.0 15.0
12: 4 0.0 NA 19.0
13: 4 0.0 28.0 0.0* # <~~ it's supposed 19.0
14: 5 1.0 5.0 1.0
15: 5 NA 20.0 NA
16: 5 10.0 30.0 20.0
17: 5 11.0 20.0 30.0
18: 5 13.0 25.0 30.0
我尝试检查重叠部分,如果起始时间小于结束时间,则需要按照每个ID设置Begin_New等于前一个End,并持续检查直到Begin大于End。但是当结束时间变量为NA时,代码无法理解,需要继续检查数值。我尝试了几种代码,但都没有成功。