我有一个类似于这样的列表:
mylist <- list(PP = c("PP 1", "OMITTED"),
IN01 = c("DID NOT PARTICIPATE", "PARTICIPATED", "OMITTED"),
RD1 = c("YES", "NO", "NOT REACHED", "INVALID", "OMITTED"),
RD2 = c("YES", "NO", "NOT REACHED", "NOT AN OPTION", "OMITTED"),
LOS = c("LESS THAN 3", "3 TO 100", "100 TO 500", "MORE THAN 500", "LOGICALLY NOT APPLICABLE", "OMITTED"),
COM = c("BAN", "SBAN", "RAL"),
VR1 = c("WITHIN 30", "WITHIN 200", "NOT AVAILABLE", "OMITTED"),
INF = c("A LOT", "SOME", "LITTLE OR NO", "NOT APPLICABLE", "OMITTED"),
IST = c("FULL-TIME", "PART-TIME", "FULL STAFFED", "NOT STAFFED", "LOGICALLY NOT APPLICABLE", "OMITTED"),
CMP = c("ALL", "MOST", "SOME", "NONE", "LOGICALLY NOT APPLICABLE", "OMITTED"))
我有另一个类似的列表:
如下所示:
matchlist <- list("INVALID", c("INVALID", "OMITTED OR INVALID"),
c("INVALID", "OMITTED"), "OMITTED", c("NOT REACHED", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED OR INVALID"),
c("Not applicable", "Not stated"), c("Not reached", "Not administered/missing by design", "Presented but not answered/invalid"),
c("Not administered/missing by design", "Presented but not answered/invalid"),
"OMITTED OR INVALID",
c("LOGICALLY NOT APPLICABLE", "OMITTED OR INVALID"),
c("NOT REACHED", "OMITTED"),
c("NOT APPLICABLE", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "NOT REACHED", "OMITTED"),
"NOT EXCLUDED", c("Default", "Not applicable", "Not stated"), c("Valid Skip", "Not Reached", "Not Applicable", "Invalid", "No Response"),
c("Not administered", "Omitted"),
c("NOT REACHED", "INVALID RESPONSE", "OMITTED"),
c("INVALID RESPONSE", "OMITTED"))
正如您所看到的,
matchlist
中的一些向量与 mylist
中的向量部分匹配。在某些情况下,matchlist
中的向量与 mylist
中的向量部分有完全匹配。例如,在 mylist
中,RD1
的最后一个值与 matchlist
的第五个组件中的向量匹配,但是 RD2
不匹配它,尽管存在公共值。在 mylist
中的 RD2
值(“NOT REACHED”、“NOT AN OPTION”、“OMITTED”)连在一起并按照此顺序不匹配 matchlist
中任何向量中的值。对于 mylist
中的 COM
值也是如此。
我想实现的目标是将 mylist
中每个向量中的元素与 matchlist
中的每个向量进行比较,提取共同和匹配 matchlist
中值顺序相同的值,并将它们存储在另一个列表中。期望的结果应该如下所示:
$PP
[1] "OMITTED"
$IN01
[1] "OMITTED"
$RD1
[1] "NOT REACHED" "INVALID" "OMITTED"
$RD2
character(0)
$LOS
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$COM
character(0)
$VR1
[1] "OMITTED"
$INF
[1] "NOT APPLICABLE" "OMITTED"
$IST
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$CMP
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
我尝试过的方法:
使用intersect
lapply(mylist, function(i) {
intersect(i, lapply(matchlist, function(i) {i}))
})
它仅返回matchlist
每个向量中的最后一个值(“OMITTED”)。
通过%in%
使用match
:
lapply(mylist, function(i) {
i[which(i %in% matchlist)]
})
只针对
RD1
("INVALID", "OMITTED") 返回所需结果,对于其余情况仅返回最后一个值("OMITTED"),除了COM
是正确的。使用
mapply
和 intersect
:mapply(intersect, mylist, matchlist)
返回一个混合了几乎所有内容的长列表,包括不应该存在的组合,以及长度不相等的警告。请问有人可以帮忙吗?
RD1
为例,当您有多个匹配项时,您的期望是什么?最长的一个(按矢量长度)?在这里,mapply
不是您想要的,它执行intersect(mylist[[1]], matchlist[[1]])
,然后执行intersect(mylist[[2]], matchlist[[2]])
,以此类推。 - r2evansmylist
中的字符串应该与matchlist
中的整个向量匹配。也就是说,RD1
中的值应该仅与mylist
中的第五个向量(c("NOT REACHED", "INVALID", "OMITTED")
)匹配,而不是其他任何东西。 - panmanRD1
匹配matchlist
的第 1、2、4、7、14、15 和 22 个索引中的一个单词;它匹配第 3、6、13、16 和 21 个索引中的两个单词;以及第 5 个索引中的三个单词。很明显,您想要其中最长的那个,是吗? - r2evans