基于列表中的共同值，从一个数据框中提取行

Question

基于列表中的共同值，从一个数据框中提取行

rlistfiltering

7

我可以帮您进行翻译，以下为您需要翻译的内容：

我正在寻找一种简单的方法来根据数值序列列表从数据框中筛选行。

以下是一个例子：

我的初始数据框：

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

我的清单：

list1 <- list(1:5,10:13)

我的目标是仅保留“data”中包含与“data”的“x”列中完全相同的数字序列的“list1”的行。因此，输出的数据框应为：

finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

有没有关于如何做到这一点的想法？

- jeff6868

如果列 y 包含 c("other_data", "data", rep("other_data",14))，那么期望的输出是什么？ - Colonel Beauvel

请使用 data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y=letters[1:16]) 作为示例，并展示预期结果。 - Roland

4个回答

1

为什么不使用来自zoo的rollapply：

library(zoo)

ind = lapply(list1, function(x) {
    n = length(x)
    which(rollapply(data$x, n, function(y) all(y==x))) + 0:(n-1)
})

data[unlist(ind),]
#x          y
#5   1 other_data
#6   2 other_data
#7   3 other_data
#8   4 other_data
#9   5 other_data
#13 10 other_data
#14 11 other_data
#15 12 other_data
#16 13 other_data

- Colonel Beauvel

我知道这种感谢评论不被鼓励，但我一直在苦苦思考如何使用rollapply实现它，所以感谢您。 - Tensibai

我从其他人使用的未知函数中学到了很多（在我看来）。 - Colonel Beauvel

1

extract_fun <- function(x, dat){
  # Index where the sequences start
  ind <- which(dat == x[1])
  # Indexes (within dat) where the sequence should be
  ind_seq <- lapply(ind, seq, length.out = length(x))
  # Extract the values from dat at the position
  dat_val <- mapply(`[`, list(dat), ind_seq)
  # Check if values within dat == those in list1
  i <- which(as.logical(apply(dat_val, 2, all.equal, x))) # which one is equal?
  # Return the correct indices
  ind_seq[[i]]
}

获取list1中每个项目的索引，并将它们组合成所需的索引。

all_ind <- do.call(c, lapply(list1, extract_fun, data$x))
data[all_ind,]

结果：

    x          y
5   1 other_data
6   2 other_data
7   3 other_data
8   4 other_data
9   5 other_data
13 10 other_data
14 11 other_data
15 12 other_data
16 13 other_data

- Rentrop

0

match2 函数遍历每个 x 值，并检查它和接下来的 n 个值是否与长度为 n 的向量匹配。然后使用 Reduce 创建一个用于索引的序列。

match2 <- function(vec) {
  start <- which(sapply(1:nrow(data), function(i) all(data$x[i:(i+length(vec)-1)] == vec)))
  Reduce(':', c(start,start+length(vec)-1))
}

有了这个，我们可以使用 apply 函数来为每个 list1 重复这个过程。

s <- sapply(list1, match2)
data[unlist(s),]
#     x          y
# 5   1 other_data
# 6   2 other_data
# 7   3 other_data
# 8   4 other_data
# 9   5 other_data
# 13 10 other_data
# 14 11 other_data
# 15 12 other_data
# 16 13 other_data

- Pierre L

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Heroka · Accepted Answer

我首先使用自定义函数获取一个序列的子集，然后使用lapply轻松扩展。

#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
  #get run lengths of whether vector is in sequence
  rle_d <- rle(v %in% sequence)
  #test if it's complete, so both v in sequence and length of 
  #matches is length of sequence
  select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)

  return(select)

}


#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
  return(data[get_row_indices(sequence=x,v=data$x),])
}))

res

> res
    x          y row_id
5   1 other_data      5
6   2 other_data      6
7   3 other_data      7
8   4 other_data      8
9   5 other_data      9
13 10 other_data     13
14 11 other_data     14
15 12 other_data     15
16 13 other_data     16