在R中计算连续值的分组比率

9
我希望能够计算组内连续值之间的比率。对于使用diff得出的差异来说,这很容易:
mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6))   
mdata$diff <- unlist(by(mdata$x, mdata$group, function(x){c(NA, diff(x))}))
mdata

  group x diff
1     A 2   NA
2     A 3    1
3     A 5    2
4     B 6   NA
5     B 3   -3
6     C 7   NA
7     C 6   -1

是否有一个等效的函数来计算比率?期望的输出如下:

  group x     ratio
1     A 2        NA
2     A 3 1.5000000
3     A 5 1.6666667
4     B 6        NA
5     B 3 0.5000000
6     C 7        NA
7     C 6 0.8571429
4个回答

7
尝试使用dplyr:
install.packages(dplyr)
require(dplyr)
mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6))   
mdata <- group_by(mdata, group)
mutate(mdata, ratio = x / lag(x))

# Source: local data frame [7 x 3]
# Groups: group

#   group x     ratio
# 1     A 2        NA
# 2     A 3 1.5000000
# 3     A 5 1.6666667
# 4     B 6        NA
# 5     B 3 0.5000000
# 6     C 7        NA
# 7     C 6 0.8571429

您的差异将简化为:


mutate(mdata, diff = x - lag(x))

# Source: local data frame [7 x 3]
# Groups: group

#   group x diff
# 1     A 2   NA
# 2     A 3    1
# 3     A 5    2
# 4     B 6   NA
# 5     B 3   -3
# 6     C 7   NA
# 7     C 6   -1

3
lag() 的完美使用案例 :) - hadley

3
相同的思路,使用 data.table
library(data.table)
dt = as.data.table(mdata)

dt[, ratio := x / lag(x), by = group]
dt
#   group x     ratio
#1:     A 2        NA
#2:     A 3 1.5000000
#3:     A 5 1.6666667
#4:     B 6        NA
#5:     B 3 0.5000000
#6:     C 7        NA
#7:     C 6 0.8571429

2
< p > 使用ave 的另一个选项:

transform(mdata, 
          ratio=ave(x, group, FUN=function(y) c(NA, tail(y, -1) / head(y, -1))))

1
使用by:
do.call(rbind, by(mdata, mdata$group, function(dat) {
  dat$ratio <- dat$x / c(NA, head(dat$x, -1))
  dat
  }))

#     group x     ratio
# A.1     A 2        NA
# A.2     A 3 1.5000000
# A.3     A 5 1.6666667
# B.4     B 6        NA
# B.5     B 3 0.5000000
# C.6     C 7        NA
# C.7     C 6 0.8571429

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接