我正在尝试构建一个流失模型,其中包括每个客户的最大连续UX失败次数,但遇到困难。以下是我的简化数据和期望输出:
library(dplyr)
df <- data.frame(customerId = c(1,2,2,3,3,3), date = c('2015-01-01','2015-02-01','2015-02-02', '2015-03-01','2015-03-02','2015-03-03'),isFailure = c(0,0,1,0,1,1))
> df
customerId date isFailure
1 1 2015-01-01 0
2 2 2015-02-01 0
3 2 2015-02-02 1
4 3 2015-03-01 0
5 3 2015-03-02 1
6 3 2015-03-03 1
期望的结果:
> desired.df
customerId maxConsecutiveFailures
1 1 0
2 2 1
3 3 2
我有些手忙脚乱,浏览其他RLLE问题并没有帮助到我-这是我“期望”的解决方案:
df %>%
group_by(customerId) %>%
summarise(maxConsecutiveFailures =
max(rle(isFailure[isFailure == 1])$lengths))
sapply(split(df$isFailure, df$customerId), function(x) {tmp <- with(rle(x==1), lengths[values]); if(length(tmp)==0) 0 else tmp})
。 - akrundata.table
的另一个选项是setDT(df)[, {tmp <- rleid(isFailure)*isFailure; tmp2 <- table(tmp[.N==1|tmp!=0]); max((names(tmp2)!=0)*tmp2)}, customerId][]
。 - akrun