R将列表嵌套转换为数据框。

4
我正在通过 library(blsR) 从BLS.gov检索有关美国通货膨胀的月度数据,我的问题是检索到的数据是一个列表的列表。
library(blsR)
cpi <- bls_request(query_n_series(c("CUUR0000SA0",         # CPI for All-Items
                                    "CUUR0000SETA02",      # CPI for Used Cars and Trucks
                                    "CUUR0000SAF11211")),  # CPI for Meats
               api_key = "6b732f6c133d48669aadd2790cb71ccc")
map(cpi, as.data.table)

它返回以下内容,而每个V1、V2和V3变量的36条记录仍需要组织成一个data.frame
$series
            V1             V2               V3
1: CUUR0000SA0 CUUR0000SETA02 CUUR0000SAF11211
2:  <list[36]>     <list[36]>       <list[36]>

预期的 data.frame 格式为:

V                year    period    periodName   value
CUUR0000SA0      2022    M12       December     100
CUUR0000SETA02   2022    M12       December     100.5
CUUR0000SAF11211 2022    M12       December     100.75

5
尝试使用 library(rrapply);rrapply(cpi, how = 'bind') %>% type.convert(as.is = TRUE) - akrun
@akrun,非常感谢您宝贵的建议!如果您能给出如何创建一个包含5列的数据框的提示,我将不胜感激:V, year, period, periodName, value - Alex
4个回答

2

我们可以使用rrapply绑定数据,然后使用pivot_longer将其重塑为“long”格式。

library(rrapply)
library(dplyr)
library(tidyr)
rrapply(cpi, how = 'bind') %>%
   pivot_longer(cols = everything(), names_to = ".value", 
      names_pattern = ".*\\.([^.]+$)") %>% 
   fill(seriesID)%>%
   type.convert(as.is = TRUE) 

-输出

# A tibble: 108 × 5
   seriesID     year period periodName value
   <chr>       <int> <chr>  <chr>      <dbl>
 1 CUUR0000SA0  2022 M12    December    297.
 2 CUUR0000SA0  2022 M11    November    298.
 3 CUUR0000SA0  2022 M10    October     298.
 4 CUUR0000SA0  2022 M09    September   297.
 5 CUUR0000SA0  2022 M08    August      296.
 6 CUUR0000SA0  2022 M07    July        296.
 7 CUUR0000SA0  2022 M06    June        296.
 8 CUUR0000SA0  2022 M05    May         292.
 9 CUUR0000SA0  2022 M04    April       289.
10 CUUR0000SA0  2022 M03    March       288.
# … with 98 more rows

2
你可以试试这个。
data_x <- lapply(1:length(cpi[["series"]]), \(x) as.data.frame(do.call(rbind, cpi[["series"]][[x]][["data"]]))[-5])

data_xx <- lapply(1:length(data_x), \(x) cbind(v = cpi[["series"]][[x]][["seriesID"]], data_x[[x]]))

do.call(rbind, data_xx)

会给我们

                   v year period periodName   value
1        CUUR0000SA0 2022    M12   December 296.797
2        CUUR0000SA0 2022    M11   November 297.711
3        CUUR0000SA0 2022    M10    October 298.012
4        CUUR0000SA0 2022    M09  September 296.808
5        CUUR0000SA0 2022    M08     August 296.171
6        CUUR0000SA0 2022    M07       July 296.276
7        CUUR0000SA0 2022    M06       June 296.311
8        CUUR0000SA0 2022    M05        May 292.296
9        CUUR0000SA0 2022    M04      April 289.109
10       CUUR0000SA0 2022    M03      March 287.504
11       CUUR0000SA0 2022    M02   February 283.716
12       CUUR0000SA0 2022    M01    January 281.148
13       CUUR0000SA0 2021    M12   December 278.802
14       CUUR0000SA0 2021    M11   November 277.948
15       CUUR0000SA0 2021    M10    October 276.589
16       CUUR0000SA0 2021    M09  September 274.310
17       CUUR0000SA0 2021    M08     August 273.567
18       CUUR0000SA0 2021    M07       July 273.003
19       CUUR0000SA0 2021    M06       June 271.696
20       CUUR0000SA0 2021    M05        May 269.195
21       CUUR0000SA0 2021    M04      April 267.054
22       CUUR0000SA0 2021    M03      March 264.877
23       CUUR0000SA0 2021    M02   February 263.014
24       CUUR0000SA0 2021    M01    January 261.582
25       CUUR0000SA0 2020    M12   December 260.474
26       CUUR0000SA0 2020    M11   November 260.229
27       CUUR0000SA0 2020    M10    October 260.388
28       CUUR0000SA0 2020    M09  September 260.280
29       CUUR0000SA0 2020    M08     August 259.918
30       CUUR0000SA0 2020    M07       July 259.101
31       CUUR0000SA0 2020    M06       June 257.797
32       CUUR0000SA0 2020    M05        May 256.394
33       CUUR0000SA0 2020    M04      April 256.389
34       CUUR0000SA0 2020    M03      March 258.115
35       CUUR0000SA0 2020    M02   February 258.678
36       CUUR0000SA0 2020    M01    January 257.971
37    CUUR0000SETA02 2022    M12   December 188.864
38    CUUR0000SETA02 2022    M11   November 193.530
39    CUUR0000SETA02 2022    M10    October 199.176
40    CUUR0000SETA02 2022    M09  September 203.867
41    CUUR0000SETA02 2022    M08     August 212.895
42    CUUR0000SETA02 2022    M07       July 213.683
43    CUUR0000SETA02 2022    M06       June 212.980
44    CUUR0000SETA02 2022    M05        May 208.373
45    CUUR0000SETA02 2022    M04      April 206.981
46    CUUR0000SETA02 2022    M03      March 208.216
47    CUUR0000SETA02 2022    M02   February 212.040
48    CUUR0000SETA02 2022    M01    January 210.293
49    CUUR0000SETA02 2021    M12   December 207.164
50    CUUR0000SETA02 2021    M11   November 200.209
51    CUUR0000SETA02 2021    M10    October 195.213
52    CUUR0000SETA02 2021    M09  September 190.207
53    CUUR0000SETA02 2021    M08     August 197.535
54    CUUR0000SETA02 2021    M07       July 200.425
55    CUUR0000SETA02 2021    M06       June 198.855
56    CUUR0000SETA02 2021    M05        May 179.532
57    CUUR0000SETA02 2021    M04      April 168.647
58    CUUR0000SETA02 2021    M03      March 153.873
59    CUUR0000SETA02 2021    M02   February 150.221
60    CUUR0000SETA02 2021    M01    January 149.659
61    CUUR0000SETA02 2020    M12   December 150.891
62    CUUR0000SETA02 2020    M11   November 152.324
63    CUUR0000SETA02 2020    M10    October 154.381
64    CUUR0000SETA02 2020    M09  September 152.887
65    CUUR0000SETA02 2020    M08     August 149.765
66    CUUR0000SETA02 2020    M07       July 141.493
67    CUUR0000SETA02 2020    M06       June 136.915
68    CUUR0000SETA02 2020    M05        May 138.383
69    CUUR0000SETA02 2020    M04      April 139.411
70    CUUR0000SETA02 2020    M03      March 140.689
71    CUUR0000SETA02 2020    M02   February 137.452
72    CUUR0000SETA02 2020    M01    January 136.064
73  CUUR0000SAF11211 2022    M12   December 318.440
74  CUUR0000SAF11211 2022    M11   November 320.049
75  CUUR0000SAF11211 2022    M10    October 323.412
76  CUUR0000SAF11211 2022    M09  September 322.857
77  CUUR0000SAF11211 2022    M08     August 321.534
78  CUUR0000SAF11211 2022    M07       July 322.115
79  CUUR0000SAF11211 2022    M06       June 320.989
80  CUUR0000SAF11211 2022    M05        May 321.060
81  CUUR0000SAF11211 2022    M04      April 319.884
82  CUUR0000SAF11211 2022    M03      March 317.642
83  CUUR0000SAF11211 2022    M02   February 313.597
84  CUUR0000SAF11211 2022    M01    January 311.191
85  CUUR0000SAF11211 2021    M12   December 312.139
86  CUUR0000SAF11211 2021    M11   November 316.520
87  CUUR0000SAF11211 2021    M10    October 314.292
88  CUUR0000SAF11211 2021    M09  September 308.306
89  CUUR0000SAF11211 2021    M08     August 301.283
90  CUUR0000SAF11211 2021    M07       July 300.549
91  CUUR0000SAF11211 2021    M06       June 296.597
92  CUUR0000SAF11211 2021    M05        May 285.991
93  CUUR0000SAF11211 2021    M04      April 280.945
94  CUUR0000SAF11211 2021    M03      March 276.764
95  CUUR0000SAF11211 2021    M02   February 275.065
96  CUUR0000SAF11211 2021    M01    January 273.843
97  CUUR0000SAF11211 2020    M12   December 272.013
98  CUUR0000SAF11211 2020    M11   November 272.939
99  CUUR0000SAF11211 2020    M10    October 274.473
100 CUUR0000SAF11211 2020    M09  September 273.730
101 CUUR0000SAF11211 2020    M08     August 277.383
102 CUUR0000SAF11211 2020    M07       July 283.826
103 CUUR0000SAF11211 2020    M06       June 298.033
104 CUUR0000SAF11211 2020    M05        May 286.185
105 CUUR0000SAF11211 2020    M04      April 270.149
106 CUUR0000SAF11211 2020    M03      March 261.489
107 CUUR0000SAF11211 2020    M02   February 260.417
108 CUUR0000SAF11211 2020    M01    January 259.503

1
你可以使用 jsonlite 解析 cpi 数据。
cpi <- do.call(rbind, cpi)

library(jsonlite)

out <- lapply(seq(cpi), function(x)
        fromJSON(toJSON(cpi[[x]])))

out[[1]] 

提供,

# $seriesID
# [1] "CUUR0000SA0"

# $data
#    year period periodName   value footnotes
# 1  2022    M12   December 296.797      NULL
# 2  2022    M11   November 297.711      NULL
# 3  2022    M10    October 298.012      NULL
# 4  2022    M09  September 296.808      NULL
# 5  2022    M08     August 296.171      NULL

同样地,你可以使用 out[[2]]out[[3]] 访问其他系列。如果你只想获取数据,可以输入 out[[1]]$data


1
这里有另一种使用 rrapply() 的方法:首先解析seriesID值,然后解析所有其他嵌套的值,最后将这两个数据框合并在一起。
library(rrapply)

## bind seriesIDs to data.frame
id <- rrapply(
  cpi, 
  condition = \(x, .xname) .xname == "seriesID", 
  how = "bind", 
  options = list(namecols = TRUE)
)
## bind other data to data.frame
other <- rrapply(
  cpi, 
  how = "bind", 
  options = list(coldepth = 5, namecols = TRUE)
)
## merge into single long data.frame
merge(id, other, all = TRUE, by = c("L1", "L2"))[, c(3, 6:9)] |>
  head(n = 10)

#>       seriesID year period periodName   value
#> 1  CUUR0000SA0 2022    M12   December 296.797
#> 2  CUUR0000SA0 2022    M11   November 297.711
#> 3  CUUR0000SA0 2022    M10    October 298.012
#> 4  CUUR0000SA0 2022    M09  September 296.808
#> 5  CUUR0000SA0 2022    M08     August 296.171
#> 6  CUUR0000SA0 2022    M07       July 296.276
#> 7  CUUR0000SA0 2022    M06       June 296.311
#> 8  CUUR0000SA0 2022    M05        May 292.296
#> 9  CUUR0000SA0 2022    M04      April 289.109
#> 10 CUUR0000SA0 2022    M03      March 287.504

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接