在R中将一个数据框列表的列表转换为单个数据框

Question

在R中将一个数据框列表的列表转换为单个数据框

4

我有一个list，其中包含多个data.frame嵌套在里面（见下文中的L）。

我想知道是否可能将L转换为我所期望的输出，即一个单独的data.frame（如下所示）？

L <- list(A = list(Short = data.frame(d = 1:2, SD = 3:4)), 
          B = list(Short = data.frame(d = 2:3, SD = 1:2), Long1 = data.frame(d = 7:8, SD = 6:7)),
          C = list(Short = data.frame(d = 5:6, SD = 3:4), Long1 = data.frame(d = 8:9, SD = 1:2), 
               Long2 = data.frame(d = 4:5, SD = 6:7)))

期望的输出结果（一个data.frame）：

- rnorouzian

4个回答

1

我们可以在基本的R中使用lapply/Map。我们可以使用lapply循环遍历list，rbind嵌套的list元素，然后使用Map创建一个新列并rbind外部list元素。

out <- do.call(rbind, Map(cbind, lapply(L, function(x) 
              do.call(rbind, x)), id = seq_along(L)))
row.names(out) <- NULL
out
#   d SD id
#1  1  3  1
#2  2  4  1
#3  2  1  2
#4  3  2  2
#5  7  6  2
#6  8  7  2
#7  5  3  3
#8  6  4  3
#9  8  1  3
#10 9  2  3
#11 4  6  3
#12 5  7  3

根据评论，如果我们需要从内部list的names中添加另一列

out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
   do.call(rbind, Map(cbind, dat, es.type = names(dat)))), id = seq_along(L)))
row.names(out1) <- NULL

out1
#   d SD es.type id
#1  1  3   Short  1
#2  2  4   Short  1
#3  2  1   Short  2
#4  3  2   Short  2
#5  7  6   Long1  2
#6  8  7   Long1  2
#7  5  3   Short  3
#8  6  4   Short  3
#9  8  1   Long1  3
#10 9  2   Long1  3
#11 4  6   Long2  3
#12 5  7   Long2  3

如果有 ..\\d+ 并且想要移除。

out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
   do.call(rbind, Map(cbind, dat, 
     es.type = sub("\\.*\\d+$", "", names(dat))))), id = seq_along(L)))
row.names(out1) <- NULL
out1
#   d SD es.type id
#1  1  3   Short  1
#2  2  4   Short  1
#3  2  1   Short  2
#4  3  2   Short  2
#5  7  6    Long  2
#6  8  7    Long  2
#7  5  3   Short  3
#8  6  4   Short  3
#9  8  1    Long  3
#10 9  2    Long  3
#11 4  6    Long  3
#12 5  7    Long  3

- akrun

@rnorouzian。您可以执行

do.call(rbind, Map(cbind, lapply(L, function(dat) do.call(rbind, Map(cbind, dat, es.type = names(dat)))), id = seq_along(L)))

。 - akrun

就像这里所说的，在 es.type 下，Long1 和 Long2 必须只是 Long。 - rnorouzian

1

@Reza 我认为你需要使用 do.call(rbind, c(Map(...., list(make.row.names = FALSE)))。 - akrun

1

@Reza plot(table(D$time), col = g)非常简洁 :=) - akrun

1

@Reza 请尝试使用do.call(rbind, Filter(Negate(is.null), Map(cbind, lapply(L, function(x) do.call(rbind, x)), id = seq_along(L))))。 - akrun

显示剩余4条评论

0

这里是另一种可能的方法，使用 purrr 的 flatten_dfr：

library(purrr)

transform(flatten_dfr(L), id = rep(seq_along(L), times = map(L, ~sum(lengths(.x)))))
#>    d SD id
#> 1  1  3  1
#> 2  2  4  1
#> 3  2  1  2
#> 4  3  2  2
#> 5  7  6  2
#> 6  8  7  2
#> 7  5  3  3
#> 8  6  4  3
#> 9  8  1  3
#> 10 9  2  3
#> 11 4  6  3
#> 12 5  7  3

注意：这里我使用了基础R的transform，可以用dplyr的mutate替代。

- Joris C.

0

rbindlist() 是一个便利函数，它可以从多个列表中创建一个 data.table。对于这种嵌套列表，需要递归地应用两次。

此外，它还有 idcol 参数，它在结果中创建一列，显示这些行来自哪个列表项。

library(data.table)
rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")

    id es.type d SD
 1:  A   Short 1  3
 2:  A   Short 2  4
 3:  B   Short 2  1
 4:  B   Short 3  2
 5:  B   Long1 7  6
 6:  B   Long1 8  7
 7:  C   Short 5  3
 8:  C   Short 6  4
 9:  C   Long1 8  1
10:  C   Long1 9  2
11:  C   Long2 4  6
12:  C   Long2 5  7

现在，OP要求id是数字，并且Long1和Long2必须变成Long。这可以通过对结果列进行后续操作来实现：

rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")[
  , id := rleid(id)][
    , es.type := sub("\\d+$", "", es.type)][]

    id es.type d SD
 1:  1   Short 1  3
 2:  1   Short 2  4
 3:  2   Short 2  1
 4:  2   Short 3  2
 5:  2    Long 7  6
 6:  2    Long 8  7
 7:  3   Short 5  3
 8:  3   Short 6  4
 9:  3    Long 8  1
10:  3    Long 9  2
11:  3    Long 4  6
12:  3    Long 5  7

在基本的R语言中，我们可以通过以下方式实现相同的效果：

do.call("rbind", lapply(L, do.call, what = "rbind"))

它返回

          d SD
A.Short.1 1  3
A.Short.2 2  4
B.Short.1 2  1
B.Short.2 3  2
B.Long1.1 7  6
B.Long1.2 8  7
C.Short.1 5  3
C.Short.2 6  4
C.Long1.1 8  1
C.Long1.2 9  2
C.Long2.1 4  6
C.Long2.2 5  7

id和es.type可通过解析行名称获取，例如，

DF <- do.call("rbind", lapply(L, do.call, what = "rbind"))
id <- stringr::str_extract(row.names(DF), "^[^.]*")
# create sequence number (that's what data.table::rleid() does)
DF$id <- c(1L, cumsum(head(id, -1L) != tail(id, -1L)) + 1L)
DF$es.type <- stringr::str_extract(row.names(DF), "(?<=\\.)[^.0-9]*")
row.names(DF) <- NULL
DF

   d SD id es.type
1  1  3  1   Short
2  2  4  1   Short
3  2  1  2   Short
4  3  2  2   Short
5  7  6  2    Long
6  8  7  2    Long
7  5  3  3   Short
8  6  4  3   Short
9  8  1  3    Long
10 9  2  3    Long
11 4  6  3    Long
12 5  7  3    Long

- Uwe

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ronak Shah · Accepted Answer

我们可以尝试将L中的每个列表进行rbinding，添加一个新列来表示列表编号，最后使用do.call和rbind将整个列表合并成一个数据框。

output <- do.call(rbind, lapply(seq_along(L), function(x) 
                          transform(do.call(rbind, L[[x]]), id = x)))
rownames(output) <- NULL

output
#   d SD id
#1  1  3  1
#2  2  4  1
#3  2  1  2
#4  3  2  2
#5  7  6  2
#6  8  7  2
#7  5  3  3
#8  6  4  3
#9  8  1  3
#10 9  2  3
#11 4  6  3
#12 5  7  3

使用 dplyr 的 bind_rows 和 purrr::map 可能会更短，但是这将把 id 变量作为列表的名称（A、B、C），而不是序列，不过更改起来应该不难。

library(dplyr)
bind_rows(purrr::map(L, bind_rows), .id = "id")  %>%
          mutate(id = match(id, unique(id)))