在R中删除重复行(基于2列)

9

I have a dataset in R which looks like this:

    x1 x2  x3
1:  A Away  2
2:  A Home  2
3:  B Away  2
4:  B Away  1
5:  B Home  2
6:  B Home  1
7:  C Away  1
8:  C Home  1

根据x1和x2列中的值,我想要删除重复的行。我尝试了以下方法:

df[!duplicated(df[,c('x1', 'x2')]),]

它应该删除第4行和第6行。但不幸的是,它没有起作用,因为它返回完全相同的数据,重复项仍然存在于数据集中。我需要使用什么来删除第4行和第6行?


1
相关但不同:https://dev59.com/YWgt5IYBdhLWcg3w0wvB/ - Frank
3个回答

7

我只会这样做:

unique(df, by=c("x1", "x2")) # where df is a data.table

如果你看一下 ?unique,这个问题就很明显了。

附注:根据你问题中的语法,我想知道你是否知道data.table和data.frame语法之间的基本区别。建议你先阅读文档


3
library("data.table")
setDT(df)[, .SD[1], by = .(x1, x2)]

#     x1   x2 x3
# 1:  A Away  2
# 2:  A Home  2
# 3:  B Away  2
# 4:  B Home  2
# 5:  C Away  1
# 6:  C Home  1

1
或者您可以使用dplyr库。
library("dplyr")
df <- data.frame(x1 = c("A","A","B","B","B","B","C","C"), x2 = c("Away","Home","Away","Away","Home","Home","Away","Home"), x3 = c(2,2,2,1,2,1,1,1))

distinct(df,x1,x2,.keep_all = TRUE)
#      x1   x2 x3
#    1  A Away  2
#    2  A Home  2
#    3  B Away  2
#    4  B Home  2
#    5  C Away  1
#    6  C Home  1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接