如何在R中计算不同年份相同月份之间的差异

3
我有一个数据框叫做df。行是月份,行名是年份。我想按照1月到12月的顺序对月份进行排序,并且想计算不同年份相同月份之间的百分比差异。例如,我想知道2009年和2008年1月、2月等的百分比差异。对所有月份进行类似处理。
这是我的df:
df <- structure(list(YEAR = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 
6L), .Label = c("2008", "2009", "2010", "2011", "2012", "2013"
), class = "factor"), M = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 
12L, 12L, 12L, 12L, 12L), .Label = c("Apr", "Aug", "Dec", "Feb", 
"Jan", "Jul", "Jun", "Mar", "May", "Nov", "Oct", "Sep"), class = "factor"), 
    Freq = c(93221016, 124800455, 224127360, 287150001, 318228530, 
    387573710, 98811936, 171940117, 239581603, 294965702, 336269471, 
    406584525, 112958413, 215853263, 282293439, 314483537, 355561387, 
    386086538, 89354868, 109900379, 206640377, 268944957, 322896485, 
    356774443, 91007916, 113469678, 220743958, 284697404, 324823553, 
    373885187, 96887316, 158230269, 242175673, 284271058, 335464023, 
    397269760, 90091044, 143862802, 232512479, 262275285, 324988644, 
    388064866, 93936288, 139665422, 213302607, 297847827, 329044914, 
    386372600, 99646750, 139195786, 229651074, 277779620, 324395065, 
    397346365, 106477407, 197698621, 256559666, 242683830, 347193478, 
    430880720, 100909236, 185392147, 258317251, 238847338, 349017727, 
    422523576, 96888876, 170467493, 240815506, 285132804, 324063033, 
    389471906)), .Names = c("YEAR", "M", "Freq"), row.names = c(NA, 
-72L), class = "data.frame")

有没有简单的方法来做这件事,可能是R中的一个包?

这不是一个数据框,我们无法将其粘贴到我们的R会话中,因为我们没有y对象。 - Joshua Ulrich
@JoshuaUlrich,我已更新原帖。 - user1471980
3个回答

2

试试这个:

library(zoo)

z <- read.zoo(df, index = 1:2, FUN = function(y, m) as.yearmon(paste(y, m), "%Y %b") )
diff(z, 12, arithmetic = FALSE)

或者稍微更紧凑一些(只有 ## 行发生了变化):
library(zoo)
library(gsubfn)
z <- fn$read.zoo(df, index = 1:2, FUN = ~ as.yearmon(paste(y, m), "%Y %b") ) ##
diff(z, 12, arithmetic = FALSE)

增加了紧凑表单。


2
这个命令会添加一个新的列,显示年份之间的差异(以百分比表示):
transform(df, diff = ave(Freq, M, FUN = function(x) 
  c(0, (diff(x) / head(x, -1)) * 100)))

0
在dplyr中:
library(dplyr)
df %.% 
  arrange(M, YEAR) %.% 
  group_by(M) %.% 
  mutate(lag_Freq = lag(Freq), z = (Freq - lag_Freq)/lag_Freq)

Source: local data frame [72 x 5]
Groups: M

   YEAR   M      Freq  lag_Freq           z
1  2008 Apr  93221016        NA          NA
2  2009 Apr 124800455  93221016  0.33875879
3  2010 Apr 224127360 124800455  0.79588576
4  2011 Apr 287150001 224127360  0.28119120
5  2012 Apr 318228530 287150001  0.10823099
6  2013 Apr 387573710 318228530  0.21791000
7  2008 Aug  98811936        NA          NA
8  2009 Aug 171940117  98811936  0.74007437
9  2010 Aug 239581603 171940117  0.39340142
10 2011 Aug 294965702 239581603  0.23117008
11 2012 Aug 336269471 294965702  0.14002906
12 2013 Aug 406584525 336269471  0.20910329

我无法使用dplyr,因为它不支持3.0.0版本。 - user1471980
看起来你是对的(CRAN链接)。你总是可以升级到3.0.2 :) - Vincent

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接