在R中创建一个滚动墙计数变量

3

我有一个包含约21k个观测值的数据集,每个观测值都有一个分类变量,选项为A、B和C。我想为以前在先前的观测中选择了选项C的国家创建一个经验变量(简单地说,是t-1案例)。我被告知这被称为滚动墙计数。我一直没有能够弄清楚如何处理这个问题或使用哪个软件包最好。任何建议都将非常有帮助!

dispute=c("1","1","1","2","2","2","2","3","3","3")
partner=c("1","2","3","1","2","3","4","2","1","3")
position=c("A","C","C","B","C","A","C","B","C","C")

目前我的数据看起来像这样:

Dispute Partner Position
1        1       A
1        2       C
1        3       C
2        1       B
2        2       C
2        3       A
2        4       C
3        1       B
3        2       C
3        3       C

理想情况下,我会创建一个变量来累加计算每个唯一观测值在取值为C时的次数(为每个唯一“合作伙伴”生成一个“经验”计数)。
Dispute Partner Position Experience
1        1       A       NA
1        2       C       1
1        3       C       1
2        1       B       NA
2        2       C       2
2        3       A       NA
2        4       C       1
3        1       B       NA
3        2       C       3
1个回答

5

使用 data.table

library(data.table)
setDT(df)[, experience:=cumsum(position=="C")*(position=="C"), by=partner] 

    dispute partner position experience
 1:       1       1        A          0
 2:       1       2        C          1
 3:       1       3        C          1
 4:       2       1        B          0
 5:       2       2        C          2
 6:       2       3        A          0
 7:       2       4        C          1
 8:       3       2        B          0
 9:       3       1        C          1
10:       3       3        C          2   

使用 dplyr
library(dplyr)
df %>% 
  group_by(partner) %>% 
  mutate(experience=cumsum(position=="C")*(position=="C"))

   dispute partner position experience
1        1       1        A          0
2        1       2        C          1
3        1       3        C          1
4        2       1        B          0
5        2       2        C          2
6        2       3        A          0
7        2       4        C          1
8        3       2        B          0
9        3       1        C          1
10       3       3        C          2

数据

df <- data.frame(dispute=c("1","1","1","2","2","2","2","3","3","3"),
                     partner=c("1","2","3","1","2","3","4","2","1","3"),
                     position=c("A","C","C","B","C","A","C","B","C","C"))

太好了!非常感谢你! - Carefreewritingsonthewall
一个问题 - 如果这些dispute(争议),partner(伙伴)和position(职位)变量来自于一个大数据集(而不是单个对象),那么代码是否会有很大变化? - Carefreewritingsonthewall
只要你的列名为dispute、partner和position,我认为你不需要进行任何修改。 - ExperimenteR

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接