从稀疏向量列表创建稀疏矩阵

Question

从稀疏向量列表创建稀疏矩阵

4

我有一个稀疏向量列表（在R中）。我需要将此列表转换为稀疏矩阵。使用for循环进行转换需要很长时间。

sm<-spMatrix(length(tc2),n.col)
for(i in 1:length(tc2)){
    sm[i,]<-(tc2[i])[[1]];  
}

有更好的方法吗？

- DAF

@DAF -- 我的回答是否解决了你的问题？如果是，你可以点击左侧的勾号接受它。如果不是，你能否提供一些你想要合并到稀疏矩阵中的稀疏向量的示例？谢谢。 - Josh O'Brien

@iterator - 我可以退一步，从一个“项集”列表开始，即每个条目都是一个数字列表，表示该行中出现的项目/单词。我想要这些数据的稀疏矩阵表示。Josh的解决方案适用于小规模示例，但在具有10K行和10K项的样本上，我会耗尽内存（16 G）。 - DAF

@DAF -- 如果我有那些数据，我可能会将其设置为sparseMatrix()构造函数的输入。您需要三个向量（可能组织为数据框的列），表示每个条目的行索引、列索引和值。运行这个玩具示例以查看它的工作原理，然后让我知道它的进展情况：sparseMatrix(i=1:4, j=4:1, x=c(2,4,5,9))。祝你好运！ - Josh O'Brien

@Josh - 谢谢！这似乎是最有效的解决方案。我在下面发布了一个执行此操作的函数。 - DAF

有没有办法在Python中实现这个？ - Abhishek Thakur

显示剩余2条评论

3个回答

4

这种将许多向量cbind在一起的情形非常适合直接将信息转储到稀疏列向量矩阵(dgCMatrix类)中。

这是一个可用的函数：

sv.cbind <- function (...) {
    input <- lapply( list(...), as, "dsparseVector" )
    thelength <- unique(sapply(input,length))
    stopifnot( length(thelength)==1 )
    return( sparseMatrix( 
            x=unlist(lapply(input,slot,"x")), 
            i=unlist(lapply(input,slot,"i")), 
            p=c(0,cumsum(sapply(input,function(x){length(x@x)}))),
            dims=c(thelength,length(input))
        ) )
}

从快速测试来看，这个操作似乎比强制转换和cBind方法快了约10倍：

require(microbenchmark)
xx <- lapply( 1:10, function (k) {
            sparseVector( x=rep(1,100), i=sample.int(1e4,100), length=1e4 )
        } )
microbenchmark( do.call( sv.cbind, xx ), do.call( cBind, lapply(xx,as,"sparseMatrix") ) )
# Unit: milliseconds
#                                            expr       min        lq      mean   median       uq       max neval cld
#                           do.call(sv.cbind, xx)  1.398565  1.464517  1.540172  1.49487  1.55911  3.455421   100  a 
#  do.call(cBind, lapply(xx, as, "sparseMatrix")) 16.037890 16.356268 16.956326 16.59854 17.49956 20.256253   100   b

- petrelharp

cbind最终会调用S4方法。请参见此处。因此，这个代码将被调用，并且那里可能会进行您没有执行的检查。我不确定是否重要，但这可能会有影响。 - Benjamin Christoffersen

@petrelharp。太棒了，这是最直观的快速操作解决方案。所有我不想做的编码都完成得很好。 - zdebruine

2

感谢Josh O'Brien提供的解决方案：创建3个列表，然后创建sparseMatrix。我在这里附上代码：

vectorList2Matrix<-function(vectorList){
 nzCount<-lapply(vectorList, function(x) length(x@j));
 nz<-sum(do.call(rbind,nzCount));
 r<-vector(mode="integer",length=nz);
 c<-vector(mode="integer",length=nz);
 v<-vector(mode="integer",length=nz);
 ind<-1;
 for(i in 1:length(vectorList)){
    ln<-length(vectorList[[i]]@i);
    if(ln>0){
     r[ind:(ind+ln-1)]<-i;
     c[ind:(ind+ln-1)]<-vectorList[[i]]@j+1
     v[ind:(ind+ln-1)]<-vectorList[[i]]@x
     ind<-ind+ln;
    }
 }
 return (sparseMatrix(i=r,j=c,x=v));
}

- DAF

帮了我很多！不过，由于我将相同大小的向量组合在一起，所以我的解决方案包含的代码稍微少一些：http://stackoverflow.com/a/32525837/1075993 - Andrey Sapegin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Josh O'Brien · Accepted Answer

这里有一个两步解决方案：

使用 lapply() 和 as(..., "sparseMatrix") 将 sparseVectors 列表转换为一列 sparseMatrices 列表。
使用 do.call() 和 cBind() 将 sparseMatrices 组合成单个 sparseMatrix。

require(Matrix)

# Create a list of sparseVectors
ss <- as(c(0,0,3, 3.2, 0,0,0,-3), "sparseVector")
l <- replicate(3, ss)

# Combine the sparseVectors into a single sparseMatrix
l <- lapply(l, as, "sparseMatrix")
do.call(cBind, l)

# 8 x 3 sparse Matrix of class "dgCMatrix"
#                    
# [1,]  .    .    .  
# [2,]  .    .    .  
# [3,]  3.0  3.0  3.0
# [4,]  3.2  3.2  3.2
# [5,]  .    .    .  
# [6,]  .    .    .  
# [7,]  .    .    .  
# [8,] -3.0 -3.0 -3.0