我有一个数据集,长这样:
Study_ID Recurrent_Status
1 100 1
2 100 NA
3 100 NA
4 200 1
5 300 NA
6 400 3
7 400 NA
8 500 3
9 500 NA
10 600 NA
11 700 1
我想要移除重复的研究ID,但保留有“复发状态”数据的条目。换句话说,我想删除每个重复的研究ID,其中“复发状态”为NA。复发状态可以是1或3(或某些未重复患者的NA值)。
我的期望输出应该类似于这样:
Study_ID Recurrent_Status
1 100 1
2 200 1
3 300 NA
4 400 3
5 500 3
6 600 NA
7 700 1
我尝试使用这段代码,但它当然会删除那些具有重复状态1或3的个体,而不是保留它们。
full_data<-filter(full_data, !duplicated(MRN, fromLast = TRUE) | Recurrence_status !="1")
full_data<-filter(full_data, !duplicated(MRN, fromLast = TRUE) | Recurrence_status !="3")
尝试移除感叹号时,出现以下错误:
full_data<-filter(full_data, !duplicated(MRN, fromLast = TRUE) | Recurrence_status ="1")
Error: unexpected '=' in "full_data<-filter(full_data, !duplicated(MRN, fromLast = TRUE) | Recurrence_status ="
我该怎样开始做这个?
可重现的数据:
data<-data.frame(Study_ID=c("100","100","100","200","300","400","400","500","500","600","700"),Recurrent_Status=c("1","NA","NA","1","NA","3","NA","3","NA","NA","1"))