基于列表项的行拆分列表。

Question

基于列表项的行拆分列表。

3

我正在尝试将我的数据帧列表拆分为一些子组，例如嵌套列表或几个列表。拆分应基于每个数据帧的行数，因此具有相同行数的数据帧应该位于同一个列表中。

full_list <- list(
  df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
  df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
  df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
  df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)

现在有两个数据框，nrow() == 10，因此它们应该放在自己的列表或子列表中。

我尝试了类似于这样的操作，但我不认为split适用于列表：

sublist <- lapply(full_list, function(x) split(full_list, f = nrow(x)))

顺便说一下：更大的目标是使用以下函数将所有数据框分成训练集和测试集，以进行机器学习。 sample 将用于创建子集，但我想要相同长度的数据框使用相同的 sample_vector。因此，我希望事先将完整列表拆分为子列表。之后，我将重新将所有数据框组合在一起进行进一步处理（类似于分割-应用-组合）。只是提一下，如果我在这里过于复杂。

# function to split data frames in each sub list into train and test data frames 
counter <- 0
train_test_list <- list()
for (x_table in sublist) {
  counter <- counter + 1
  current_name <- paste(names(sublist)[counter], sep = "_")

  sample_vector <- sample.int(n = nrow(x_table), 
    size = floor(0.8 * nrow(x_table)), replace = FALSE)
  train_set <- x_table[sample_vector, ]
  test_set  <- x_table[-sample_vector, ]

  train_test_list[[current_name]] <- list(
    train_set = train_set, test_set = test_set, 
    table_name = names(sublist)[counter]
  )
}
# combine all lists with test and train pairs back into one list 
full_train_test_list <- c(train_test_list1, train_test_list2, train_test_list3, ...)

- crazysantaclaus

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以使用sapply和split来根据该信息获取行数。

new_list <- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# $ 10:List of 2
#  ..$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# $ 15:List of 1
#  ..$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# $ 20:List of 1
#  ..$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...

由于它是一个嵌套的list，我们可以通过在第一个lapply中调用lapply来处理内部的list

traintestlst <- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {

     sample_vector <- sample.int(n = nrow(x_table), 
                size = floor(0.8 * nrow(x_table)), replace = FALSE)
      train_set <- x_table[sample_vector, ]
      test_set  <- x_table[-sample_vector, ]
      list(train_set = train_set, test_set = test_set)


     })
    )

-检查输出

traintestlst[[1]]$df1
#$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0

#$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0