假设我有两个数据框“values”和“weights”,我想按类别(A,B,C)列计算按权重加权的年度中位数(year1,year2):
values <- data.frame(TICKER=c("A","A","B","B","B","C","C","C","C"), year1=c(1,2,3,4,5,6,7,8,9), year2=c(9,8,7,6,5,4,3,2,1))
weights <- data.frame(TICKER=c("A","A","B","B","B","C","C","C","C"), year1=c(0.3,0.7,0.25,0.25,0.5,0.1,0.1,0.6,0.2), year2=c(0.6,0.4,0.3,0.5,0.2,0.4,0.2,0.1,0.3))
为此,我想使用ddply和weightedMedian函数(matrixStats包)。
output <- ddply(values, .(TICKER), colwise(weightedMedian(values, weights), na.rm=TRUE))
但是,我收到了错误信息:
"(list) object cannot be coerced to type 'double'"
有人知道如何调整代码以获得可行的解决方案吗?
我尝试将数据框转换为矩阵(通过as.matrix),因为weightedMedian需要矩阵作为输入。然而,这并没有帮助。 到目前为止,我找到的唯一解决方案是使用子集循环(但这非常缓慢且不太优雅)。
output <- matrix(data=0, nrow=3, ncol=2)
for (i in 2:ncol(values)){
for (j in 1:length(unique(values$TICKER))){
values.j <- subset(values, values$TICKER == as.character(unique(values$TICKER)[j]))
weights.j <- subset(weights, weights$TICKER == as.character(unique(values$TICKER)[j]))
output[j,(i-1)] <- weightedMedian(values.j[,i], weights.j[,i], na.rm=TRUE)
}}
任何帮助都将不胜感激。非常感谢。