我正在尝试解决一个棘手的R问题,通过关键字搜索仍未能解决。具体来说,我正在尝试从一个数据框中选择一个子集,其值不出现在另一个数据框中。这是一个例子:
> test
number fruit ID1 ID2
item1 "number1" "apples" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "44" "25"
item4 "number4" "apples" "12" "13"
> test2
number fruit ID1 ID2
item1 "number1" "papayas" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "441" "25"
item4 "number4" "apples" "123" "13"
item5 "number3" "peaches" "44" "25"
item6 "number4" "apples" "12" "13"
item7 "number1" "apples" "22" "33"
我有两个数据框test和test2,目标是选择test2中所有完整的行,在test中不存在,即使一些值是相同的。
我想要的输出应该像这样:
item1 "number1" "papayas" "22" "33"
item2 "number3" "peaches" "441" "25"
item3 "number4" "apples" "123" "13"
可能有任意数量的行或列,但在我的特定情况下,一个数据框是另一个数据框的直接子集。
我广泛使用了R中的subset()、merge()和which()函数,但无法弄清如何将它们组合使用,如果可能的话,以获得我想要的结果。
编辑:这是我用来生成这两个表格的R代码。
test <- data.frame(c("number1", "apples", 22, 33), c("number2", "oranges", 13, 33),
c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13))
test <- t(test)
rownames(test) = c("item1", "item2", "item3", "item4")
colnames(test) = c("number", "fruit", "ID1", "ID2")
test2 <- data.frame(data.frame(c("number1", "papayas", 22, 33), c("number2", "oranges", 13, 33),
c("number3", "peaches", 441, 25), c("number4", "apples", 123, 13),c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13) ))
test2 <- t(test2)
rownames(test2) = c("item1", "item2", "item3", "item4", "item5", "item6")
colnames(test2) = c("number", "fruit", "ID1", "ID2")
提前感谢您!
t
。幸运的是,merge
足够聪明,可以将你的矩阵转回数据框。但不幸的是,它无法将现在属于factor
类的id
转回数值类型。 - Hong Ooiplyr
中否定match_df
。 - tumultous_rooster