我有一个数据表dt
。这个数据表首先按照列date
(我的分组变量)排序,然后按照列age
排序:
library(data.table)
setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"
> dt
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-01 5 Charlie
4: 2000-01-02 6 Adam
5: 2000-01-02 7 Bob
6: 2000-01-02 8 Campbell
我的问题是:我想知道是否可以提取每个唯一日期的前两行?或者更一般地说:
如何在每个组中提取前n行?
在这个例子中,dt.f
的结果将是:
> dt.f = ???????? # function of dt to extract the first 2 rows per unique date
> dt.f
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob
附上创建上述数据表的代码:
install.packages("data.table")
library(data.table)
date <- c("2000-01-01","2000-01-01","2000-01-01",
"2000-01-02","2000-01-02","2000-01-02")
age <- c(3,4,5,6,7,8)
name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")
dt <- data.table(date, age, name)
setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"
dt[dt[, .I[1:2], by = date]$V1]
速度要快得多。 - eddiMicrobenchmark
非常棒。我自己也是在SO上学到的。 - Ricardo Saporta