我有以下数据框记录政策的演变:
Df <- data.frame(Id_policy = c("A_001", "A_002", "A_003","B_001","B_002"),
date_new = c("20200101","20200115","20200304","20200110","20200215"),
date_end = c("20200503","20200608","20210101","20200403","20200503"),
expend = c("","A_001","A_002","",""))
它看起来像这样:
Id_policy date_new date_end expend
A_001 20200101 20200503
A_002 20200115 20200608 A_001
A_003 20200304 20210101 A_002
B_001 20200110 20200403
B_002 20200215 20200503
"Id_policy"指特定政策,"date_new"是政策发布日期,"date_end"是政策结束日期。但有时政策会被延长。在这种情况下,将设置一个新政策,并且变量"expend"提供更改前政策的名称。
这里的想法是压缩数据集,只保留与不同政策对应的行。因此,输出将类似于以下内容:
Id_policy date_new date_end expend
A_001 20200101 20210101
B_001 20200110 20200403
B_002 20200215 20200503
是否有人遇到过类似的问题?