在R中合并列表的行

4

I have a list of the format:

[[1]]
 [1] "10"  "719" "99"  

[[2]]
 [1] "10"  "624" "85"  "888" "624" 

[[3]]
 [1] "1"   "894" "110" "344" "634"  

我希望按照列表中第一个元素的唯一值进行合并,即:

[[1]]
 [1] "10"  "719" "99" "624" "85"  "888" "624" 

[[2]]
 [1] "1"   "894" "110" "344" "634"

有没有一种方法可以使用最少的内存来完成这个任务?
2个回答

2
我会按照以下方式处理:

我会这样处理:

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85",  "888", "624"),
          c("1",   "894", "110", "344", "634"))
first_elems <- sapply(x, "[", 1) # get 1st elem of each vector
(first_elems <- as.factor(first_elems)) # factorize (i.a. find all unique elems)
## [1] 10 10 1 
## Levels: 1 10
(group <- split(x, first_elems)) # split by 1st elem (divide into groups)
## $`1`
## $`1`[[1]]
## [1] "1"   "894" "110" "344" "634"
## 
## 
## $`10`
## $`10`[[1]]
## [1] "10"  "719" "99" 
## 
## $`10`[[2]]
## [1] "10"  "624" "85"  "888" "624"
## 
(result <- lapply(group, unlist)) # combine vectors in each group (list of vectors -> an atomic vector)
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "10"  "624" "85"  "888" "624"

编辑:对于非重复键,请使用以下方法:

(result <- lapply(group, function(x) {
      c(x[[1]][1], unlist(lapply(x, "[", -1)))
   }))
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "624" "85"  "888" "624"

不需要太多额外的内存。除了我们需要存储as.factor的结果列表(类别数量+x中元素数量),我们还需要存储结果列表。 split需要很少的额外内存 - x中的向量没有进行深度复制。

至于性能,在相当大的列表中:

set.seed(1L)
n <- 100000
x <- vector('list', n)
for (i in 1:n)
   x[[i]] <- as.character(sample(1:1000, ceiling(runif(1, 1, 1000)), replace=TRUE))
object.size(x) # 2GB
## 2175165880 bytes

我在我的旧Linux笔记本上获得了以下运行时间:

system.time(local({
   first_elems <- as.factor(sapply(x, "[", 1))
   group <- split(x, first_elems)
   result <- lapply(group, function(x) {
     c(x[[1]][1], unlist(lapply(x, "[", -1)))
   })
}))

##    user  system elapsed 
##   4.119   0.001   4.149 

我认为这很合理。


谢谢!但是最终列表中“10”出现了两次...我只想合并其他元素...请帮忙解决这个问题... - Divi

0

我不确定速度,但这里有一个for循环方法(我很少使用),它展示了处理您的列表所需的方法论。

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85" , "888", "624"),
          c("1",   "894", "110", "344", "634"))  

y <- vector('list', length(x)) # allocate a list at least as long as x

for(i in 2:length(x)){
  if((x[[i-1]] %in% x[[i]])[1]){
    y[[i-1]] <- c(unlist(x[[i-1]]), unlist(x[[i]][-1]))
  } else {
    y[[i-1]] <- x[[i]]
  }
}

z <- y[!sapply(y, is.null)]
z
# [[1]]
# [1] "10"  "719" "99"  "624" "85"  "888" "624"
# 
# [[2]]
# [1] "1"   "894" "110" "344" "634"

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接