稀疏矩阵按行求和的最快方法

4

I have a big csr_matrix(1M*1K) and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. Actually my problem is exactly same as this Sum over rows in scipy.sparse.csr_matrix. The only thing is I find the accepted solution to be slow for my purpose. Let me state what I have

map_fn = np.random.randint(0, 10000, 1000000)

map_fn here tells me how my input rows(1M) are mapped into my output rows(10K). For example ith input row gets added up into map_fn[i] output row. I tried the two approaches mentioned in the above question, namely forming a sparse matrix and using sparse sum. Although the sparse matrix approach looks way better than sparse sum approach but I find it slow for my purpose. Here is the code comparing two approaches:

import scipy.sparse
import numpy as np 
import time

print "Setting up input"
s=10000
n=1000000
d=1000
density=1.0/500

X=scipy.sparse.rand(n,d,density=density,format="csr")
map_fn=np.random.randint(0, s, n)

# Approach 1
start_time=time.time()
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
print "Approach 1 Creation time : ",time.time()-start_time
SX = S.dot(X)
print "Approach 1 Total time : ",time.time()-start_time

#Approach 2
start_time=time.time()
SX = np.zeros((s,X.shape[1]))
for i in range(SX.shape[0]):
    SX[i,:] = X[np.where(map_fn==i)[0],:].sum(axis=0)

print "Approach 2 Total time : ",time.time()-start_time

which gives following numbers:

Approach 1 Creation time :  0.187678098679
Approach 1 Total time :  0.286989927292
Approach 2 Total time :  10.208632946

So my question is this is there a better way of doing this? I find forming sparse matrix to be an overkill as it takes more than half of the time. Are there any better alternatives? Any suggestions are greatly appreciated. Thanks


你尝试过使用Sparse-sum吗:https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.sum.html? - Divakar
1
哇!仅仅运行一百亿个项目就需要433毫秒!你可能想要编写自己的子程序(用低级语言?)来处理这样的事情。 - SIGSTACKFAULT
@Divakar 我试过了,请看编辑。谢谢。 - user1131274
如果您提供一个完整的工作示例,测试替代方案将更容易。我开始设置一些东西,但在尺寸变得太混乱后放弃了。 - hpaulj
@hpaulj 对不起,我的表述不够清晰。我已经添加了一个可行的例子。如果您有任何问题,请告诉我。非常感谢。 - user1131274
显示剩余2条评论
1个回答

4

起步方法

采用这篇帖子中的稀疏解决方案 -

def sparse_matrix_mult_sparseX_mod1(X, rows):   
    nrows = rows.max()+1
    ncols = X.shape[1]
    nelem = nrows * ncols

    a,b = X.nonzero()
    ids = rows[a] + b*nrows
    sums = np.bincount(ids, X[a,b].A1, minlength=nelem)
    out = sums.reshape(ncols,-1).T
    return out

基准测试

原始方法 #1 -

def app1(X, map_fn):
    col = scipy.arange(n)
    val = np.ones(n)
    S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
    SX = S.dot(X)
    return SX

时序和验证 -

In [209]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/500
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [210]: out1 = app1(X, map_fn)
     ...: out2 = sparse_matrix_mult_sparseX_mod1(X, map_fn)
     ...: print np.allclose(out1.toarray(), out2)
     ...: 
True

In [211]: %timeit app1(X, map_fn)
1 loop, best of 3: 517 ms per loop

In [212]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
10 loops, best of 3: 147 ms per loop

公平起见,我们应该计时来自app1的最终密集数组版本 -

In [214]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 584 ms per loop

迁移到 Numba

我们可以将计算分箱计数的步骤迁移到 Numba,这可能对于输入矩阵更密集的情况有益。其中一种方法是 -

from numba import njit

@njit
def bincount_mod2(out, rows, r, C, V):
    N = len(V)
    for i in range(N):
        out[rows[r[i]], C[i]] += V[i]
    return out

def sparse_matrix_mult_sparseX_mod2(X, rows):
    nrows = rows.max()+1
    ncols = X.shape[1]
    r,C = X.nonzero()

    V = X[r,C].A1
    out = np.zeros((nrows, ncols))
    return bincount_mod2(out, rows, r, C, V)

时间 -

In [373]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/100 # Denser now!
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [374]: %timeit app1(X, map_fn)
1 loop, best of 3: 787 ms per loop

In [375]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
1 loop, best of 3: 906 ms per loop

In [376]: %timeit sparse_matrix_mult_sparseX_mod2(X, map_fn)
1 loop, best of 3: 705 ms per loop

app1得到的密集输出 -

In [379]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 910 ms per loop

感谢您向我介绍numba。 - user1131274

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接