我正在处理时间序列数据。数据集如下:
datALL <- read.table(header=TRUE, text="
ID Year Align
A01 2017 329
A01 2016 NA
A01 2015 NA
A01 2014 314
A01 2013 NA
A01 2012 NA
A01 2011 432
A02 2017 4536
A02 2016 NA
A02 2015 NA
A02 2014 2345
A02 2013 NA
A02 2012 NA
A02 2011 1932
")
datALL
ID Year Align
1 A01 2017 329
2 A01 2016 NA
3 A01 2015 NA
4 A01 2014 314
5 A01 2013 NA
6 A01 2012 NA
7 A01 2011 432
8 A02 2017 4536
9 A02 2016 NA
10 A02 2015 NA
11 A02 2014 2345
12 A02 2013 NA
13 A02 2012 NA
14 A02 2011 1932
我希望使用imputeTS
包来填补缺失值。对于单独的ID
,该包的效果非常好。
datA01 <- read.table(header=TRUE, text="
ID Year Align
A01 2017 329
A01 2016 NA
A01 2015 NA
A01 2014 314
A01 2013 NA
A01 2012 NA
A01 2011 432
")
datA01
ID Year Align
1 A01 2017 329
2 A01 2016 NA
3 A01 2015 NA
4 A01 2014 314
5 A01 2013 NA
6 A01 2012 NA
7 A01 2011 432
### install.packages("imputeTS")
library(imputeTS)
datA01$Year <- ts(datA01[, c(2)])
datA01$Align1 <- na_kalman(datA01$Align)
dat1
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 318.9847
3 A01 2015 NA 312.7852
4 A01 2014 314 314.0000
5 A01 2013 NA 347.2150
6 A01 2012 NA 387.7720
7 A01 2011 432 432.0000
对于A02
,它也可以完美地工作:
datA02 <- read.table(header=TRUE, text="
ID Year Align
A02 2017 4536
A02 2016 NA
A02 2015 NA
A02 2014 2345
A02 2013 NA
A02 2012 NA
A02 2011 1932
")
datA02$Year <- ts(datA02[, c(2)])
datA02$Align1 <- na_kalman(datA02$Align)
datA02
ID Year Align Align1
1 A02 2017 4536 4536.000
2 A02 2016 NA 3510.613
3 A02 2015 NA 3168.817
4 A02 2014 2345 2345.000
5 A02 2013 NA 2485.226
6 A02 2012 NA 2143.431
7 A02 2011 1932 1932.000
如果将所有数据放在一起,是行不通的,因为它将所有14年视为连续的时间序列。应该基于ID
每七年为一组。我需要帮助编写一个循环函数来解决这个问题。
datALL$Year <- ts(datALL[, c(2)])
datALL$Align1 <- na_kalman(datALL$Align)
#### WRONG IMPUTATION DUE TO FAILUE IN SEPARATING YEARS BY ID
datALL
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 808.8287
3 A01 2015 NA 968.7716
4 A01 2014 314 314.0000
5 A01 2013 NA 1288.6573
6 A01 2012 NA 1448.6002
7 A01 2011 432 432.0000
8 A02 2017 4536 4536.0000
9 A02 2016 NA 1928.4289
10 A02 2015 NA 2088.3718
11 A02 2014 2345 2345.0000
12 A02 2013 NA 2408.2575
13 A02 2012 NA 2568.2004
14 A02 2017 1932 1932.0000
正确的数据应该像这样
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 318.9847
3 A01 2015 NA 312.7852
4 A01 2014 314 314.0000
5 A01 2013 NA 347.2150
6 A01 2012 NA 387.7720
7 A01 2011 432 432.0000
8 A02 2017 4536 4536.000
9 A02 2016 NA 3510.613
10 A02 2015 NA 3168.817
11 A02 2014 2345 2345.000
12 A02 2013 NA 2485.226
13 A02 2012 NA 2143.431
14 A02 2011 1932 1932.000