如果第一行满足特定条件，则删除所有与ID匹配的数据。

Question

如果第一行满足特定条件，则删除所有与ID匹配的数据。

3

我想要删除特定条件下，按日期排序后客户的首个项目的score时对应的所有client_id的数据。据我所知，data.table可以实现这一点。我已经接近实现。

以下是一些示例数据：

client_id <- c(1,1,1,2,2,3,3,3,3,4,4)
date <- c("1/1/2021", "1/2/2021", "1/3/2021", "5/1/2021", "10/1/2021", "10/1/2021", "11/1/2021", "1/2/2021", "10/9/2021", "15/9/2021", "16/10/2021")
date <- as.Date(date, '%d/%m/%Y')
score <- c(15,10,19,20,10,25,20,15,10,30,5)
df <- data.frame(client_id, date, score)

我尝试了这个：

df <-setDT(df)
df[client_id %in% df[score > 16, client_id], ]

我希望这将删除client_id 1，因为第一个分数<16。然而，似乎只有当所有分数>16时才会删除所有内容。

- Stinky_Goat

3个回答

1

也许使用which.min(date)更为安全:

df[,.SD[score[which.min(date)]>16],by=client_id]

- هنروقتان

我想在这里分享一下，作为一个解决方案，供那些试图做类似事情的人参考。这是我在尝试使用日期时遇到的情况。如果客户ID的第一个日期早于指定的日期，则删除所有与这些客户ID相关的数据：df[!client_id %in% df[, .(first_date = min(date)), by = client_id][first_date < as.Date("2021-09-01"), client_id]] - undefined

0

一个 tidyverse 选项：

library(tidyverse)

df %>% 
  arrange(client_id, date) %>% 
  group_by(client_id) %>% 
  filter(first(score) > 16)

输出

  client_id date       score
      <dbl> <date>     <dbl>
1         2 2021-01-05    20
2         2 2021-01-10    10
3         3 2021-01-10    25
4         3 2021-01-11    20
5         3 2021-02-01    15
6         3 2021-09-10    10
7         4 2021-09-15    30
8         4 2021-10-16     5

或者另外一个 data.table 的选项：

df[df[, .I[first(score)>16], by=client_id]$V1]

   client_id       date score
1:         2 2021-01-05    20
2:         2 2021-01-10    10
3:         3 2021-01-10    25
4:         3 2021-01-11    20
5:         3 2021-02-01    15
6:         3 2021-09-10    10
7:         4 2021-09-15    30
8:         4 2021-10-16     5

- AndrewGB

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Waldi · Accepted Answer

如果df中的日期是升序的，你可以使用.SD和first来处理数据：

df[,.SD[first(score)>16],by=client_id]
   client_id       date score
       <num>     <Date> <num>
1:         2 2021-01-05    20
2:         2 2021-01-10    10
3:         3 2021-01-10    25
4:         3 2021-01-11    20
5:         3 2021-02-01    15
6:         3 2021-09-10    10
7:         4 2021-09-15    30
8:         4 2021-10-16     5