R数据框转换为嵌套列表

3
我想将这个格式为(tbl)的数据框转换为以下嵌套列表(tbllst):
library(tidyr)

tbl <- tribble(
  ~Col1, ~Col2, ~Col3,
  "Var1", "Var1_1", "Var1_1_1", 
  "Var1", "Var1_1", "Var1_1_2", 
  "Var1", "Var1_2", "Var1_2_1", 
  "Var1", "Var1_2", "Var1_2_2", 
)

tbllst <- list(
  Col1 = list(
    "Var1" = list(
      Col2 = list(
        "Var1_1" = list(
          Col3 = c(
            "Var1_1_1", 
            "Var1_1_2"
          )
        ),
        "Var1_2" = list(
          Col3 = c(
            "Var1_2_1", 
            "Var1_2_2"
          )
        )
      )
    )
  )
)

有自动化的方法可以实现这个吗?

2个回答

3

rrapply包中的rrapply()函数有一个选项how = "unmelt",将融合的数据框转换为嵌套列表,其中数据框中的每一行成为嵌套列表中的一个节点路径。

要应用此函数,我们首先需要将tbl数据框转换为rrapply()所需的输入格式:

library(purrr)
library(dplyr)
library(rrapply)

## put data.frame in format for rrapply-function
tbl1 <- imap_dfc(tbl, ~bind_cols(.y, .x)) %>%
  group_by(across(num_range(prefix = "...", range = 1:5))) %>%
  summarize(`...6` = list(c(`...6`)))

tbl1
#> # A tibble: 2 x 6
#> # Groups:   ...1, ...2, ...3, ...4 [2]
#>   ...1  ...2  ...3  ...4   ...5  ...6     
#>   <chr> <chr> <chr> <chr>  <chr> <list>   
#> 1 Col1  Var1  Col2  Var1_1 Col3  <chr [2]>
#> 2 Col1  Var1  Col2  Var1_2 Col3  <chr [2]>

## unmelt to nested list
ls_tbl <- rrapply(tbl1, how = "unmelt")

str(ls_tbl)
#> List of 1
#>  $ Col1:List of 1
#>   ..$ Var1:List of 1
#>   .. ..$ Col2:List of 2
#>   .. .. ..$ Var1_1:List of 1
#>   .. .. .. ..$ Col3: chr [1:2] "Var1_1_1" "Var1_1_2"
#>   .. .. ..$ Var1_2:List of 1
#>   .. .. .. ..$ Col3: chr [1:2] "Var1_2_1" "Var1_2_2"

注意,group_by()summarize()操作的目的是仅在单个Col3节点下获取多个var1_%_%。以下方法更简单(但不会产生完全相同的结果):
ls_tbl <- rrapply(imap_dfc(tbl, ~bind_cols(.y, .x)), how = "unmelt")

str(ls_tbl)
#> List of 1
#>  $ Col1:List of 1
#>   ..$ Var1:List of 1
#>   .. ..$ Col2:List of 2
#>   .. .. ..$ Var1_1:List of 2
#>   .. .. .. ..$ Col3: chr "Var1_1_1"
#>   .. .. .. ..$ Col3: chr "Var1_1_2"
#>   .. .. ..$ Var1_2:List of 2
#>   .. .. .. ..$ Col3: chr "Var1_2_1"
#>   .. .. .. ..$ Col3: chr "Var1_2_2"

2

以下是使用data.table + rrapply的另一种选项

library(data.table)
library(rrapply)

dt <- setDT(tbl)[, Map(function(...) list2DF(.(...)), names(.SD), .SD)]
rrapply(dt[, lapply(.SD, list), c(head(names(dt), -1))], how = "unmelt")

这提供了

$Col1
$Col1$Var1
$Col1$Var1$Col2
$Col1$Var1$Col2$Var1_1
$Col1$Var1$Col2$Var1_1$Col3
[1] "Var1_1_1" "Var1_1_2"


$Col1$Var1$Col2$Var1_2
$Col1$Var1$Col2$Var1_2$Col3
[1] "Var1_2_1" "Var1_2_1"

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接