合并多个数据框并计算平均值

12

我有三个数据框如下。我希望根据Lon & Lat将它们合并成一个数据框,并对每个'cell'的3个值进行平均。我阅读了这个(calculate average over multiple data frames),并尝试使用aggregate但无济于事...任何指针都将不胜感激。

> head(CSR.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May     June        July     August  September   October    November  December
1  28.5 -4.5 17.710425 13.855327 12.385712 13.558101 12.789865 6.913783  1.03770075 -5.3901741 -6.6351015 -7.661375 -3.09337944 6.0659410
2  29.5 -4.5 14.010154 10.257435  9.009641 10.275778  9.598241 5.166972  0.73570247 -4.2733162 -5.0861417 -5.850192 -2.93521806 4.1240150
3  30.5 -4.5 16.288443 10.467614  9.275714 10.904162 10.228808 5.364853  0.50089883 -4.7478741 -5.4320069 -6.316568 -3.80160315 3.8494745
4  31.5 -4.5 18.560677  9.932461  9.239592 11.037748 10.551886 5.281853  0.01181973 -4.9034324 -5.3504391 -6.438050 -4.41695714 3.3432301
5  32.5 -4.5 10.171202  4.476512  4.509140  5.448872  5.338991 2.556262 -0.22646611 -2.3274204 -2.4376636 -3.103697 -2.27586145 1.3641930
6  33.5 -4.5 14.040068  5.349344  5.772618  7.158792  7.121341 3.407587 -0.30616689 -2.6800099 -2.7955420 -3.803622 -2.77898997 1.4021380

> head(GFZ.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May      June     July     August September   October   November  December
1  28.5 -4.5 15.642782 15.521720 11.823875 19.825865 17.335761 11.208188 5.080615 -3.0897644 -5.733351 -4.196604 -1.6697661 10.744696
2  29.5 -4.5 12.164074 10.931418  8.622238 15.341911 12.969769  8.521280 4.072790 -2.4301791 -4.551170 -3.055914 -1.2260079  7.592880
3  30.5 -4.5 13.579305 10.267520  8.787406 16.567715 13.745143  9.121496 4.497849 -2.6723491 -5.022949 -3.269881 -1.0691039  7.377143
4  31.5 -4.5 14.501465  8.600480  8.259757 16.981533 14.054429  9.318550 4.582672 -2.7917893 -5.249895 -3.636936 -0.5141342  6.770836
5  32.5 -4.5  7.311216  3.249596  3.513870  8.430777  6.941659  4.572560 2.203461 -1.4106516 -2.661226 -2.113089  0.2459282  3.049897
6  33.5 -4.5  9.121348  3.113245  3.584976 11.040761  8.732950  5.772059 2.811168 -1.8554437 -3.524447 -3.272863  1.2493973  3.750694

> head(JPL.GRACE[,c(1:14)],10)
    Lon  Lat   January  February     March     April       May     June     July     August  September    October   November   December
1  28.5 -4.5 19.559790 14.544438 12.035112 13.944141 11.931011 7.513007 3.095003 -3.6165702 -6.5945043 -7.2498567 -4.5402436  6.3935236
2  29.5 -4.5 15.740160 11.192191  8.549782 10.783359  9.401173 5.834498 2.267822 -2.6354346 -4.8939197 -5.5912996 -3.7295148  4.1461123
3  30.5 -4.5 18.984714 12.014807  8.510139 11.628697 10.635699 6.448064 2.260429 -2.6979695 -5.2102337 -6.2646164 -4.2713238  3.5089825
4  31.5 -4.5 22.794356 11.993054  8.162500 11.813746 11.747350 6.955983 2.164615 -2.5707902 -5.3448873 -6.7473006 -4.5777496  2.5609555
5  32.5 -4.5 13.233634  5.606305  3.880347  5.753024  6.388978 3.742596 1.096214 -1.1103189 -2.6367831 -3.4102675 -2.2860237  0.7826054
6  33.5 -4.5 19.260989  6.761722  4.978247  7.373498  9.135645 5.421030 1.706414 -1.0796434 -3.3122886 -4.2114588 -2.8110246  0.4825075
5个回答

16

你可以做:

library(data.table)

rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))[,lapply(.SD,mean), list(Lon, Lat)]

说明:

您的data.frames被放入一个list中,并使用rbindlist进行'水平叠加'(返回一个data.table)。我们这样做是因为您的data.frame具有相同的结构(相同的列数和名称,相同的数据类型)。 另一种方法是执行do.call(rbind, list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))

然后,我们循环遍历每个不同的Lon, Lat对。.SD表示与每个对应的Lon,Lat对相关的data.table。您可以通过执行以下操作查看它:

dt = rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))
dt[,print(.SD), list(Lon, Lat)]

对于这些.SD,我们只需循环遍历列并计算平均值。


1
运行得很好 - 谢谢你,@ColonelBeauvel。出于好奇和为了下次学习,有没有可能解释一下这段代码是如何工作的呢?...还是个新手 - Darren J
1
这真是太棒了,@ColonelBeauvel - 再次感谢您的努力! - Darren J
这太棒了!谢谢你! - philiporlando

6

使用3列数组,以1:2作为“MARGIN”非常容易实现:

install.packages('abind')
library(abind)
temp_array <- abind(CSR.GRACE, GFZ.GRACE, JPL.GRACE, along=3)
res <- apply(temp_array, 1:2, mean)

这是一个简单的例子:
 x <- matrix(1:12,3,4)
 y <- x+100; z= y-50
 apply( abind::abind(x,y,z, along=3),  1:2, mean)
     [,1] [,2] [,3] [,4]
[1,]   51   54   57   60
[2,]   52   55   58   61
[3,]   53   56   59   62

我遇到了以下错误NameError: name 'library' is not defined - Mahmud
错误信息让我想到你是否忘记在 install.packages(.) 内部引用它。install.packages 函数的第一个参数确实需要加引号。如果没有加引号,未加引号的参数将被用作字符数据对象的 R 名称。 - IRTFM
其实我一开始是在用Python尝试,后来才意识到这个解决方案是针对R的。 - Mahmud

4

既然您的数据都是数值型的,您可以将它们放入一个三维数组中,并使用rowMeans函数。

library(abind)

arr = abind(JPL.GRACE, GFZ.GRACE, CSR.GRACE, along = 3)
rowMeans(arr, dims = 2)

2
也许sqldf可以帮助您:
library(sqldf)
df1 <- CSR.GRACE[,c(1:14)]
df2 <- GFZ.GRACE[,c(1:14)]
df3 <- JPL.GRACE[,c(1:14)]
# This could be done with rbind(), but I'll use sqldf()
# I'm assuming all data frames have the same columns:
df <- sqldf('select * from df1 
             union all select * from df2 
             union all select * from df3')
# The average can be done also with sqldf (just a demo)
sqldf('select Lon, Lat, avg(January) as jan, avg(February) as feb
       from df
       group by Lon, Lat')

可能有更好的解决方案,但这是一个简单的方法。

希望这可以帮助到您。


感谢帮助,@Barranka - 这是一个很有用的软件包要知道:) - Darren J
@DarrenJ 很高兴能帮助你。 - Barranka

0
这是一个基于R的解决方案:
r <- do.call(rbind, list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))
r <- aggregate(r[-c(1,2)], list(r$Lon, r$Lat), mean)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接