我有一个值向量(x)。
我想要确定它与列表(y)中每个集合的重叠长度,但不运行循环或lapply。这可行吗?
我真的很想加快执行速度。
非常感谢!以下是使用循环实现的示例:
x <- c(1:5)
y <- list(1:5, 2:6, 3:7, 4:8, 5:9, 6:10)
overlaps <- rep(0, length(y))
for (i in seq(length(y))) { #i=1
# overlaps[i] <- length(intersect(x, y[[i]])) # it is slower than %in%
overlaps[i] <- sum(x %in% y[[i]])
}
overlaps
以下是一些在下面回答中提出的方法的比较。正如您所看到的,循环仍然是最快的-但我希望找到更快的方法:
# Function with the loop:
myloop <- function(x, y) {
overlaps <- rep(0, length(y))
for (i in seq(length(y))) overlaps[i] <- sum(x %in% y[[i]])
overlaps
}
# Function with sapply:
mysapply <- function(x, y) sapply(y, function(e) sum(e %in% x))
# Function with map_dbl:
library(purrr)
mymap <- function(x, y) {
map_dbl(y, ~sum(. %in% x))
}
library(microbenchmark)
microbenchmark(myloop(x, y), mysapply(x, y), mymap(x, y), times = 30000)
# Unit: microseconds
# expr min lq mean median uq max neval
# myloop(x, y) 17.2 19.4 26.64801 21.2 22.6 9348.6 30000
# mysapply(x, y) 27.1 29.5 39.19692 31.0 32.9 20176.2 30000
# mymap(x, y) 59.8 64.1 88.40618 66.0 70.5 114776.7 30000
*apply
函数? - iod