聚合和缺失值

3

我正在尝试对我的数据进行平均值计算,但是有两个问题困扰着我:1. 获取正确的布局和2. 在结果中包含缺失值。

#My input data:
Stock <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B")
Soil <- c("Blank", "Blank", "Control", "Control", "Clay", "Clay", "Blank", "Blank", "Control", "Control", "Clay", "Clay")
Nitrogen <- c(NA, NA, 0, 0, 20, 20, NA, NA, 0, 0, 20, 20)
Respiration <- c(112, 113, 124, 126, 139, 137, 109, 111, 122, 124, 134, 136)
d <- as.data.frame(cbind(Stock, Soil, Nitrogen, Respiration))

#The outcome I'd like to get:
Stockr <- c("A", "A", "A", "B", "B", "B")
Soilr <- c("Blank", "Control", "Clay", "Blank", "Control", "Clay")
Nitrogenr <- c(NA, 0, 20, NA, 0, 20)
Respirationr <- c(111, 125, 138, 110, 123, 135)
result <- as.data.frame(cbind(Stockr, Soilr, Nitrogenr, Respirationr))

非常感谢您的帮助!

3个回答

1
你可以使用aggregatemerge的组合:
d <- data.frame(Stock=Stock, Soil=Soil, 
                Nitrogen=Nitrogen, Respiration=Respiration)

## aggregate values; don't remove NAs (na.action=NULL)
nitrogen <- aggregate(Nitrogen ~ Stock + Soil, data=d, FUN=mean, na.action=NULL)
respiration <- aggregate(Respiration ~ Stock + Soil, data=d, FUN=mean)

## merge results
merge(nitrogen, respiration)

#  Stock    Soil Nitrogen Respiration
#1     A   Blank       NA       112.5
#2     A    Clay       20       138.0
#3     A Control        0       125.0
#4     B   Blank       NA       110.0
#5     B    Clay       20       135.0
#6     B Control        0       123.0

1

这里有一个使用plyr包中的ddply函数的解决方案:

library(plyr)
ddply(d, .(Stock, Soil, Nitrogen), summarise,
      Respiration = mean(as.numeric(as.character(Respiration))))

#   Stock    Soil Nitrogen Respiration
# 1     A   Blank     <NA>       112.5
# 2     A    Clay       20       138.0
# 3     A Control        0       125.0
# 4     B   Blank     <NA>       110.0
# 5     B    Clay       20       135.0
# 6     B Control        0       123.0

请注意,使用cbind不是创建数据框的好方法。您应该改用data.frame(Stock,Soil,Nitrogen,Respiration)。由于您的方法,d的所有列都是因子。我使用as.numeric(as.character(Respiration))来获取此列的数值。

0

还有一种方法,你可以使用data.table

require(data.table)
d1 = data.table(d)
sapply(colnames(d1)[3:4],function(x) d1[[x]] <<- as.numeric(d1[[x]]))
d1[,list("AVG_Nitro"=mean(Nitrogen,na.rm=T),"AVG_Resp"=mean(Respiration,na.rm=T)),by="Stock,Soil"]

  Stock    Soil AVG_Nitro AVG_Resp
1:     A   Blank       NaN    112.5
2:     A Control         0    125.0
3:     A    Clay        20    138.0
4:     B   Blank       NaN    110.0
5:     B Control         0    123.0
6:     B    Clay        20    135.0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接