从数据框中有条件地删除行

3
如何根据条件从数据表中删除行?
例如,我有以下数据表:
Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Banana, 2001
Banana, 2002
Banana, 2003
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004
Dog, 2001
Dog, 2002
Dog, 2004
Water, 2002
Water, 2003
Water, 2004

接下来,我想只包含每个组中2001-2004年的行,即:

Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004
3个回答

3
使用data.table,检查每个'Col1'组的'year'列中是否存在所有的2001:2004,然后获取数据表的子集。
library(data.table)
setDT(df1)[, if(all(2001:2004 %in% year)) .SD, by = Col1]
#    Col1 year
#1: Apple 2001
#2: Apple 2002
#3: Apple 2003
#4: Apple 2004
#5: Candy 2001
#6: Candy 2002
#7: Candy 2003
#8: Candy 2004

数据

df1 <- structure(list(Col1 = c("Apple", "Apple", "Apple", "Apple", "Banana", 
"Banana", "Banana", "Candy", "Candy", "Candy", "Candy", "Dog", 
"Dog", "Dog", "Water", "Water", "Water"), year = c(2001L, 2002L, 
 2003L, 2004L, 2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2004L, 
 2001L, 2002L, 2004L, 2002L, 2003L, 2004L)), .Names = c("Col1", 
 "year"), class = "data.frame", row.names = c(NA, -17L))

2

使用 base R 我们可以使用 ave 来获得所需的结果

df[ave(df$year, df$Col1, FUN = function(x) all(2001:2004 %in% x)) == 1, ]

#   Col1 year
#1  Apple 2001
#2  Apple 2002
#3  Apple 2003
#4  Apple 2004
#8  Candy 2001
#9  Candy 2002
#10 Candy 2003
#11 Candy 2004

2

dplyr方法:

library(dplyr) # or library(tidyverse)
df1 %>% 
    group_by(Col1) %>% 
    filter(all(2001:2004 %in% year))

. %>% filter(TRUE) 返回所有行,而 . %>% filter(FALSE) 删除了所有数据行。

输出:

Source: local data frame [8 x 2]
Groups: Col1 [2]

   Col1  year
  <chr> <int>
1 Apple  2001
2 Apple  2002
3 Apple  2003
4 Apple  2004
5 Candy  2001
6 Candy  2002
7 Candy  2003
8 Candy  2004

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接