我有一个包含病人ID和日期的住院记录数据框。
问题
我想合并任何行,其中HospNum_Id
与上一行相同且两行之间的日期差异> 3天。
输入
这里显示了一个合成数据集:
structure(list(HospNum_Id = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A791697", "V682805", "X608693"
), class = "factor"), VisitDate = structure(c(17181, 17183, 17192,
17168, 17169, 17186, 17189, 17212, 17215, 17167, 17173, 17190
), class = "Date"), diffDate = structure(c(-2, -9, NA, -1, -17,
-3, -23, -3, NA, -6, -17, NA), class = "difftime", units = "days")), .Names = c("HospNum_Id",
"VisitDate", "diffDate"), row.names = c(NA, -12L), class = "data.frame")
我的尝试
我所采取的步骤是:
1. 排序列
Mydf<-Mydf[order(Mydf$HospNum_Id,Mydf$VisitDate),]
2. 添加日期差异列
library(rlang)
library(dplyr)
SurveilTimeByRow <-
function(Mydf, HospNum_Id, VisitDate) {
HospNum_Ida <- sym(HospNum_Id)
VisitDatea <- sym(VisitDate)
ret<-dataframe %>% arrange(!!HospNum_Ida,!!VisitDatea) %>%
group_by(!!HospNum_Ida) %>%
mutate(diffDate = difftime(as.Date(!!VisitDatea), lead(as.Date(
!!VisitDatea
), 1), units = "days"))
dataframe<-data.frame(ret)
return(dataframe)
}
Mydf<-SurveilTimeByRow(try,"HospNum_Id","VisitDate")
3. 如果行的日期差大于等于-3或小于等于3,则将该行添加到上一行中
这是我遇到困难的部分。
所需输出
HospNum_Id VisitDate diffDate HospNum_Id.1 VisitDate.1 diffDate.1
A791697 2017-01-15 -2 days A791697 2017-01-17 -9 days
V682805 2017-01-02 -1 days V682805 2017-01-03 -17 days
V682805 2017-01-20 -3 days V682805 2017-01-23 -23 days
V682805 2017-02-15 -3 days V682805 2017-02-18 NA days
我将删除最后一列difftime.1,最终这一列将是多余的。