按组删除除最后一行以外符合特定条件的行。

Question

按组删除除最后一行以外符合特定条件的行。

4

我有一个叫做 df 的 DataFrame（以下是 dput）:

  group indicator value
1     A     FALSE     2
2     A     FALSE     1
3     A     FALSE     2
4     A      TRUE     4
5     B     FALSE     5
6     B     FALSE     1
7     B      TRUE     3

我想要删除每组中非末尾indicator == FALSE的行。这意味着在df中，应删除行1、2和5，因为它们不是每个组中FALSE的最后一行。以下是所需输出：

  group indicator value
1     A     FALSE     2
2     A      TRUE     4
3     B     FALSE     1
4     B      TRUE     3

我想知道有没有人知道如何在R中按组删除符合特定条件的非最后一行？

df的dput:

df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
    indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
    ), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
-7L))

- Quinten

1

@AnoushiravanR，因为第3行是最后一个FALSE的行。我想保留每个组中最后一行为FALSE的所有行。这就是为什么第3行和第6行没有被删除的原因。 - Quinten

1

最后一行指示器始终为TRUE还是也可能为FALSE？ - Anoushiravan R

1

@AnoushiravanR，是的，它们总是最后一行！谢谢。 - Quinten

它总是 F,F,F...,F,T 吗？ - zx8754

1

@zx8754，如果每组中没有F，则仍应返回T。 - Quinten

显示剩余10条评论

4个回答

1

另一种方法：

library(dplyr)

df %>%
  group_by(group) %>%
  slice_max(cumsum(!indicator))

注意：虽然这种方法可以处理示例和OP的澄清，即T始终出现在最后，但在序列中如T，F，F，T这样的情况下，它将无法工作，因为您想保留两个T而不仅仅是跟随F的一个。

输出：

# A tibble: 4 x 3
# Groups:   group [2]
  group indicator value
  <chr> <lgl>     <dbl>
1 A     FALSE         2
2 A     TRUE          4
3 B     FALSE         1
4 B     TRUE          3

- arg0naut91

1

以下是一些可能的替代方案：

“愚蠢”的解决方案

should_be_kept <- logical(nrow(df))
for(row in 1:nrow(df)) {
  if(df[row,"Indicator"]) {
    should_be_kept[row] <- TRUE
  } else if(row == max(which(!df[, "Indicator"] & df$Group == df[row, "Group"]))) {
    should_be_kept[row] <- TRUE
  } else {
    should_be_kept[row] = FALSE
  }
}
df[should_be_kept, ]

使用自定义函数来查找每个组中最后一个FALSE指示器的解决方案。

rows_to_keep <- logical(nrow(df)) #We create a TRUE/FALSE vector with one entry for each row of df
rows_to_keep[df$Indicator] <- TRUE #If Indicator is TRUE, we mark that row as "selectable"

get_last_false_in_group <- function(df, group) {
  return(max(which(df$Group == group & !df$Indicator))) #Gets the last time the condition inside of which() is met
}

#The following chunk does a group-by-group search of the last false indicator. There's probably some apply magic that simplifies this but I'm too dumb to come up with it.
groups <- levels(factor(df$Group))
for(current_group in groups) {
  rows_to_keep[get_last_false_in_group(df, current_group)] <- TRUE
}

#Now that our rows_to_keep vector is ready, we can filter the corresponding rows and get the intended result:
df[rows_to_keep,]

使用data.table包，可以将对max(which(...))的调用替换为只调用last函数

- David

1

您可以使用lead函数，并检查以下指示器是否为TRUE来实现此操作。

library(tidyverse)
df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
                     indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
                     ), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
                                                                                             -7L))
df |> 
  group_by(group) |> 
  mutate(slicer = if_else(lead(indicator) ==F, 1, 0)) |> 
  mutate(slicer = if_else(is.na(slicer), 0 , slicer)) |> 
  filter(slicer == 0) |> 
  select(-slicer)
#> # A tibble: 4 × 3
#> # Groups:   group [2]
#>   group indicator value
#>   <chr> <lgl>     <dbl>
#> 1 A     FALSE         2
#> 2 A     TRUE          4
#> 3 B     FALSE         1
#> 4 B     TRUE          3

- MarBlo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zephryl · Accepted Answer

使用last(which())进行过滤以找出每个组中最后一个FALSE行的行号：

library(dplyr)

df %>%
  group_by(group) %>%
  filter(indicator | row_number() == last(which(!indicator))) %>%
  ungroup()

# A tibble: 4 × 3
  group indicator value
  <chr> <lgl>     <dbl>
1 A     FALSE         2
2 A     TRUE          4
3 B     FALSE         1
4 B     TRUE          3