在R的data.table中，按多列保留第一行

Question

在R的data.table中，按多列保留第一行

5

我希望从一个 data.table 中仅获取按多列分组的第一行。

如果只有一列，这很容易实现，例如：

(dt <- data.table(x = c(1, 1, 1, 2),
                  y = c(1, 1, 2, 2),
                  z = c(1, 2, 1, 2)))
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
#     x y z
# |1: 1 1 1
# |2: 2 2 2

但是当尝试基于两列进行删除时，这些方法都无法奏效；也就是说，在此情况下仅删除第二行：

dt[!duplicated(x, y)] # Keeps only original data set
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
#     x y z
# |1: 1 1 1
# |2: 2 2 2

除此之外，在某些情况下才有效：

dt[!duplicated(paste0(x, y))]
#     x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2

- Max Ghenis

2个回答

6

data.table通过关键字进行duplicated操作。来自?duplicated.data.table：

 ‘duplicated’ returns a logical vector indicating which rows of a
 ‘data.table’ have duplicate rows (by key).

setkey(dt, x, y)
dt[!duplicated(dt)]
##    x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2

- Jake Burkhead

默认情况下，按 key 排序，您可以指定按变量排序。 - mnel

@mnel，是的，我给你的答案点了赞。只是想让你明白为什么这种行为是有意义的，尽管它可能看起来很奇怪。 - Jake Burkhead

dt[!duplicated(dt[,c("x","y"),with=F])] #看起来可以工作 - akrun

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mnel · Accepted Answer

data.table提供unique、duplicated和anyDuplicated的S3方法。

unique(dt, by = c('x','y'))

会给你想要的东西。