从稀疏矩阵的一行创建一个稀疏对角矩阵

5

我在Python/Scipy中处理着相当大的矩阵。我需要从大矩阵(已加载为coo_matrix)中提取行并将它们用作对角线元素。目前,我的做法如下:

import numpy as np
from scipy import sparse

def computation(A):
  for i in range(A.shape[0]):
    diag_elems = np.array(A[i,:].todense())
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
    #...

#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')

我从 profile 输出中看到,大部分时间都被 get_csr_submatrix 函数用于提取diag_elems。这使我认为,我可能使用了效率低下的稀疏数据表示法或者从稀疏矩阵中提取行的方法不正确。你能否建议一种更好的方法来从稀疏矩阵中提取行并将其以对角线形式表示?
编辑
下面的变体消除了行提取的瓶颈(请注意,仅简单地将“csc”更改为“csr”是不够的,必须将“ A [i,:]”替换为“ A.getrow(i)”)。但是主要问题是如何省略实例化(.todense())并从行的稀疏表示创建对角线矩阵。
import numpy as np
from scipy import sparse

def computation(A):
  for i in range(A.shape[0]):
    diag_elems = np.array(A.getrow(i).todense())
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
    #...

#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')

如果我直接从1行CSR矩阵创建对角矩阵,方法如下:
diag_elems = A.getrow(i)
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1])

那么我既不能指定format="csc"参数,也不能将ith_diags转换为CSC格式:

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/profile.py", line 70, in run
    prof = prof.run(statement)
  File "/usr/local/lib/python2.6/profile.py", line 456, in run
    return self.runctx(cmd, dict, dict)
  File "/usr/local/lib/python2.6/profile.py", line 462, in runctx
    exec cmd in globals, locals
  File "<string>", line 1, in <module>
  File "<stdin>", line 4, in computation
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags
    return dia_matrix((data, diags), shape=(m,n)).asformat(format)
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat
    return getattr(self,'to' + format)()
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc
    return self.tocoo().tocsc()
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc
    data    = np.empty(self.nnz, dtype=upcast(self.dtype))
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast
    raise TypeError,'no supported conversion for types: %s' % args
TypeError: no supported conversion for types: object`

1
你尝试过使用 format="csr" 吗? - cyborg
使用'csr'作为初始数据,并将A[i,:]替换为A.getrow(i),我取得了显著的加速效果。但我想要的是在创建对角矩阵之前避免实例化行。有什么好的建议吗? - savenkov
1个回答

3
这是我想出来的内容:
def computation(A):
    for i in range(A.shape[0]):
        idx_begin = A.indptr[i]
        idx_end = A.indptr[i+1]
        row_nnz = idx_end - idx_begin
        diag_elems = A.data[idx_begin:idx_end]
        diag_indices = A.indices[idx_begin:idx_end]
        ith_diag = sparse.csc_matrix((diag_elems, (diag_indices, diag_indices)),shape=(A.shape[1], A.shape[1]))
        ith_diag.eliminate_zeros()

Python分析器显示1.464秒,之前为5.574秒。它利用了定义稀疏矩阵的底层密集数组(indptr、indices、data)。这是我的速成课程:A.indptr[i]:A.indptr[i+1]定义了在密集数组中与第i行的非零值相对应的元素。A.data是A的非零值的密集一维数组,而A.indptr是这些值所在的列。

我会进行更多测试以确保其与以前执行的相同。我只检查了几个情况。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接