基于列表对象中的条件筛选R语言列表

Question

基于列表对象中的条件筛选R语言列表

3

这是一个微不足道的问题，但我被困住了。如何根据数据框的长度过滤数据框列表？该列表是嵌套的 - 这意味着有不同长度的数据框的列表的列表。以下是一个示例。我想要过滤或子集化列表，只包括那些对象长度小于n，比如3。

这里是一个示例和我的当前方法。

library(tidyverse)

# list of list with arbitrary lengths 

star.wars_ls <- list(starwars[1:5], 
                     list(starwars[1:8], starwars[4:6]), 
                     starwars[1:2], 
                     list(starwars[1:7], starwars[2:6]), 
                     starwars[1:3])


# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).

# Here is my attempt, I'm stuck at how to obtain 
# the number of varibles in each dataframe and then filter by it. 

map(star.wars_ls, function(x){
    map(x, function(x){ ## Incorrectly returns 20 for all 
        length(y)
    })

})

- elliot

你想如何处理嵌套？解除嵌套？忽略？按层次应用过滤器？ - Gregor Thomas

我希望在适用时保留嵌套。因此，我认为最好是分层应用过滤器。 - elliot

我现在会进行编辑。谢谢。 - elliot

嗯，我认为“递归过滤”可能比“分层过滤”更好。这看起来很有前途。 - Gregor Thomas

我认为您还需要进行编辑，以便(a)定义n和(b)定义y。如果能看到期望的结果会更好。比如，如果n是3，您是否希望得到list(NULL, list(NULL, starwars[4:6]), list(NULL, NULL), starwars[1:3])？或者其他什么结果？ - Gregor Thomas

我已更新以包括n = 3，但不确定如何继续定义y，因为我的上述代码尚未能够提供每个数据框的正确长度。 - elliot

4个回答

1

您应该能够检查 star.wars_ls 中的项是列表还是数据框。然后，检查每个项中的列数。尝试使用：

library(tidyverse)

# list of list with arbitrary lengths 

star.wars_ls <- list(starwars[1:5], 
                     list(starwars[1:8], starwars[4:6]), 
                     starwars[1:2], 
                     list(starwars[1:7], starwars[2:6]), 
                     starwars[1:3])


# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).

datacols <- map(star.wars_ls, function(X) {
  if (is.data.frame(X) == T) {
    ncol(X) } 
    else {
      map(X, function(Y) {
        ncol(Y)
      })
      }
    }
)

# > datacols
# [[1]]
# [1] 5
# 
# [[2]]
# [[2]][[1]]
# [1] 8
# 
# [[2]][[2]]
# [1] 3
# 
# 
# [[3]]
# [1] 2
# 
# [[4]]
# [[4]][[1]]
# [1] 7
# 
# [[4]][[2]]
# [1] 5
# 
# 
# [[5]]
# [1] 3

这只会给你每个数据框内的长度（列数），要获取索引（我相信还有更有效的方法，也许其他人可以帮助处理）：

indexlist <- c()
for (i in 1:length(datacols)) {
  if (length(datacols[[i]]) == 1) {
    if (datacols[[i]][1] == 3) {
      index <- i 
      indexlist <- c(indexlist, as.character(index))
    }
  } else {
    for (j in 1:length(datacols[[i]])) {
      if (datacols[[i]][[j]][1] == 3) {
        index <- str_c(i, ",", j)
        indexlist <- c(indexlist, index)
      }
    }
  }
}

# > indexlist
# [1] "2,2" "5"

- Felix T.

1

您可以使用递归。无论列表嵌套多深都没有关系：

ff = function(x)map(x,~if(is.data.frame(.x)){if(length(.x)==3) .x} else ff(.x))
ff(star.wars_ls)

- Onyambu

0

既然你提到了子集/过滤器，你可以使用基本的R语言中的Reduce函数进行递归来生成你的子集：

my_reduce = function(x) {
  Reduce(function(acc, b) {
    if (is.data.frame(b)) {
      # could transform before appending b as well
      if (length(b) == 3) append(acc, list(b)) else acc
    } else {
      append(acc, my_reduce(b))
    }
  }, x, list())
}

my_reduce(star.wars_ls)

或者你可以使用 purrr 将列表扁平化，自 1.0.0 版本以来就支持，然后再进行筛选（尽管它无法处理多层嵌套）：

tibble(x = star.wars_ls %>% list_flatten()) %>% 
  filter(map_lgl(x, ~ length(.x) == 3))

- qix

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以做。

  map(star.wars_ls, ~ if(is.data.frame(.x)) .x[length(.x) == 3] else map(.x, ~ .x[length(.x) == 3]))