按组筛选不同数量的行

Question

按组筛选不同数量的行

4

我想按id筛选每个x行，但是每个id的x不同。

示例数据集：

df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 
   'P2', 'P3', 'P3'),
           points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))

以下代码和数据来源于此处。

df %>%
  group_by(id) %>%
  filter(row_number() %in% c(1, 2))

这将为每个id过滤前两行。目前为止都好。

但我想根据以下向量中存储的值，为每个id过滤不同数量的行。

nrowtofilter <- c(3, 2, 1)

因此，我想要过滤P1的3行，P2的2行以及P3的1行。

但是当我执行时

df %>%
  group_by(id) %>%
  filter(row_number() %in% nrowtofilter)

我提取每个ID的前3行。

如何根据 nrowtofilter 过滤 id？

- August Nilsson

4个回答

2

你可以将数据分组，然后使用 map2（或 mapply）来对每个组的前 n 行进行切片：

library(dplyr)
nrowtofilter <- c(3, 2, 1)
df %>% 
  group_split(id) %>% 
  map2(nrowtofilter, ~ slice_head(.x, n = .y)) %>% 
  bind_rows()

输出

# A tibble: 6 × 2
  id    points
  <chr>  <dbl>
1 P1        56
2 P1        94
3 P1        17
4 P2        55
5 P2        15
6 P3        55

在基本的R中，使用相同的逻辑：

split(df, df$id) |>
  Map(f = function(x, y) head(x, y), y = nrowtofilter) |>
  do.call(what = "rbind")

- Maël

2

首先创建一个查找表：

nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3 
#  3  2  1

然后是 group_modify() 函数：

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
  ungroup()

# # A tibble: 6 × 2
#   id    points
#   <chr>  <dbl>
# 1 P1        56
# 2 P1        94
# 3 P1        17
# 4 P2        55
# 5 P2        15
# 6 P3        55

.x 是给定组的行子集，.y 是一个单行 tibble，每个分组变量对应一列，用于标识该组。

- Darren Tsai

1

另一个选择

library(dplyr)
tibble(id = unique(df$id), nrowtofilter) %>% 
  left_join(df, .) %>%
  filter(row_number() <= first(nrowtofilter), .by = 'id') %>% 
  select(-nrowtofilter)

-输出

  id points
1 P1     56
2 P1     94
3 P1     17
4 P2     55
5 P2     15
6 P3     55

- akrun

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- M--ßţřịƙïñĝ · Accepted Answer

使用cur_group_id的不同方法，不需要将数据集拆分为数据框列表：

library(dplyr)

df %>% 
  group_by(id) %>% 
  filter(row_number() <= nrowtofilter[cur_group_id()])

#> # A tibble: 6 x 2
#> # Groups:   id [3]
#>   id    points
#>   <chr>  <dbl>
#> 1 P1        56
#> 2 P1        94
#> 3 P1        17
#> 4 P2        55
#> 5 P2        15
#> 6 P3        55