在R中何时建议使用稀疏矩阵?

4

最近我一直在研究电力模拟,以下是我的代码:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

X1Z111200x2矩阵。在Stackoverflow的帮助下,我成功地使我的计算比以前更加高效:

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

Y 将会是一个长度为 11200 的向量。随后我会大量地复制这个函数 (比如说 1000 次):

sim <- replicate(n        = 1000,
                 expr     = funab()},
                 simplify = FALSE)

sim将是一个11200x1000的列表。考虑到我想要频繁地这样做,可能还要在funab()中加入更多的代码,我想知道在当前的计算中是否建议对X1Z1使用稀疏矩阵?


你熟悉 microbenchmark 吗?你可以使用它来比较函数的性能,也就是基准测试。只需简单地执行 install.packages(c("microbenchmark"), dependencies = TRUE)require(microbenchmark)example(microbenchmark),你就知道该怎么做了。我在 这个 SO 回答 中使用了 microbenchmark - Eric Fail
直到现在我才知道。 :) 我今天会去检查一下! - lord.garbage
如果你进行了测试,将其添加到你的问题中会很有趣。 - Eric Fail
我会做到的。希望我能在这个周末完成它! - lord.garbage
1个回答

1

好的,我尝试按照评论中给出的建议并使用microbenchmark包进行了测试。为了方便复制粘贴,我将重复上面的代码:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

我现在创建与稀疏矩阵相同的矩阵:
sparseX1 <- sparse.model.matrix(~ mmm,
                                data = simdat)

sparseZ1 <- sparse.model.matrix(~ ttt,
                                data = simdat)

我接着设置了这两个函数:

funab_sparse <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(sparseX1 %*% beta)

    M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

library(microbenchmark)
res <- microbenchmark(funab(), funab_sparse(), times = 1000)

并获得结果:

> res <- microbenchmark(funab(), funab_sparse(), times = 1000)
> res
Unit: milliseconds
           expr      min       lq   median       uq      max neval
        funab() 2.200342 2.277006 2.309587 2.481627 69.99895  1000
 funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907  1000

假设我没有犯任何实质性的错误,我可以得出结论:使用稀疏矩阵进行计算不会加速我的代码。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接