矩阵乘法子集,快速且稀疏

4
将协同过滤代码转换为使用稀疏矩阵的方式,我正在思考以下问题:给定两个完整的矩阵X(m x l)和Theta(n x l),以及一个稀疏矩阵R(m x n),有没有一种快速计算稀疏内积的方法。大尺寸为m和n(约为100000),而l很小(约为10)。这可能是处理大数据的一个相当常见的操作,因为它出现在大多数线性回归问题的成本函数中,所以我期望scipy.sparse内置了解决方案,但我还没有发现任何明显的解决方案。
在Python中完成这个任务的朴素方法是R.multiply(XTheta.T),但这将导致评估完整的矩阵X Theta.T (m x n,约为100000 ** 2),这将占用太多内存,然后丢弃大部分条目,因为R是稀疏的。
stackoverflow上已经有一个伪解决方案,但它在一步中不是稀疏的:pseudo solution already here on stackoverflow
def sparse_mult_notreally(a, b, coords):
    rows, cols = coords
    rows, r_idx = np.unique(rows, return_inverse=True)
    cols, c_idx = np.unique(cols, return_inverse=True)
    C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is dense
    return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) )

这对我小规模的数组来说工作得很好,而且速度也很快,但是当我处理大数据集时它会出现以下错误:

... in sparse_mult(a, b, coords)
      132     rows, r_idx = np.unique(rows, return_inverse=True)
      133     cols, c_idx = np.unique(cols, return_inverse=True)
  --> 134     C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is not sparse
      135     return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) )

ValueError: array is too big.

一个实际上是稀疏的解决方案,但非常缓慢,是:
def sparse_mult(a, b, coords):
    rows, cols = coords
    n = len(rows)
    C = np.array([ float(a[rows[i],:]*b[:,cols[i]]) for i in range(n) ]) # this is sparse, but VERY slow
    return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) )

有人知道一种快速的、完全稀疏的方法来实现这个吗?

1
这个操作使用向量化方法已经尽可能地稀疏了。将有问题的代码拆分成三行来查看内存错误发生的时间,例如 aa = a[rows, :]; bb = b[:, cols]; C = np.dot(aa, bb)。您不需要调用 np.array,因为它实际上会复制数组,所以它甚至可能是内存错误的罪魁祸首。在生成 R 的过程中,您会破坏 XTheta 吗? - Jaime
我的输入a、b是np.matrix,因此没有np.array的结果C[r_idx,c_idx]是一个2D(n x 1)矩阵,而不是一个1D数组。这导致在sp.coo_matrix调用中出现错误,因此我在其中放置了np.array。事先转换为数组可能会节省一些时间。 - Alexander Tronchin-James
X和Theta都是完整的矩阵,但我不明白为什么你要销毁它们?请注意,虽然我的矩阵R是稀疏的,但其稀疏模式使得每一行至少有一个条目,每一列也至少有一个条目,因此唯一调用的结果是m和n的完整范围。这导致np.dot的结果是我们试图稀疏相乘的两个数组的完整密集乘积。我尝试按照你的建议定义aa和bb,但在“C = ...”处出现了相同的错误。也许cython是解决方案? - Alexander Tronchin-James
3个回答

3
我为您的问题提供了4种不同的解决方案,并且看起来无论数组的大小如何,Numba jit解决方案是最好的。接近第二的是@Alexander's Cython解决方案。
以下是结果(M是x数组中行数):
M = 1000
function sparse_dense    took 0.03 sec.
function sparse_loop     took 0.07 sec.
function sparse_numba    took 0.00 sec.
function sparse_cython   took 0.09 sec.
M = 10000
function sparse_dense    took 2.88 sec.
function sparse_loop     took 0.68 sec.
function sparse_numba    took 0.00 sec.
function sparse_cython   took 0.01 sec.
M = 100000
function sparse_dense    ran out of memory
function sparse_loop     took 6.84 sec.
function sparse_numba    took 0.09 sec.
function sparse_cython   took 0.12 sec.

我使用的脚本对这些方法进行分析,如下所示:

import numpy as np
from scipy.sparse import coo_matrix
from numba import autojit, jit, float64, int32
import pyximport
pyximport.install(setup_args={"script_args":["--compiler=mingw32"],
                              "include_dirs":np.get_include()},
                  reload_support=True)

def sparse_dense(a,b,c):
    return coo_matrix(c.multiply(np.dot(a,b)))

def sparse_loop(a,b,c):
    """Multiply sparse matrix `c` by np.dot(a,b) by looping over non-zero
    entries in `c` and using `np.dot()` for each entry."""
    N = c.size
    data = np.empty(N,dtype=float)
    for i in range(N):
        data[i] = c.data[i]*np.dot(a[c.row[i],:],b[:,c.col[i]])
    return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1]))

#@autojit
def _sparse_mult4(a,b,cd,cr,cc):
    N = cd.size
    data = np.empty_like(cd)
    for i in range(N):
        num = 0.0
        for j in range(a.shape[1]):
            num += a[cr[i],j]*b[j,cc[i]]
        data[i] = cd[i]*num
    return data

_fast_sparse_mult4 = \
    jit(float64[:,:](float64[:,:],float64[:,:],float64[:],int32[:],int32[:]))(_sparse_mult4)

def sparse_numba(a,b,c):
    """Multiply sparse matrix `c` by np.dot(a,b) using Numba's jit."""
    assert c.shape == (a.shape[0],b.shape[1])
    data = _fast_sparse_mult4(a,b,c.data,c.row,c.col)
    return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1]))

def sparse_cython(a, b, c):
    """Computes c.multiply(np.dot(a,b)) using cython."""
    from sparse_mult_c import sparse_mult_c

    data = np.empty_like(c.data)
    sparse_mult_c(a,b,c.data,c.row,c.col,data)
    return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1]))

def unique_rows(a):
    a = np.ascontiguousarray(a)
    unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
    return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))

if __name__ == '__main__':
    import time

    for M in [1000,10000,100000]:
        print 'M = %i' % M
        N = M + 2
        L = 10

        x = np.random.rand(M,L)
        t = np.random.rand(N,L).T

        # number of non-zero entries in sparse r matrix
        S = M*10

        row = np.random.randint(M,size=S)
        col = np.random.randint(N,size=S)

        # remove duplicate rows and columns       
        row, col = unique_rows(np.dstack((row,col)).squeeze()).T

        data = np.random.rand(row.size)

        r = coo_matrix((data,(row,col)),shape=(M,N))

        a2 = sparse_loop(x,t,r)

        for f in [sparse_dense,sparse_loop,sparse_numba,sparse_cython]:
            t0 = time.time()
            try:
                a = f(x,t,r)
            except MemoryError:
                print 'function %s ran out of memory' % f.__name__
                continue
            elapsed = time.time()-t0
            try:
                diff = abs(a-a2)
                if diff.nnz > 0:
                    assert np.max(abs(a-a2).data) < 1e-5
            except AssertionError:
                print f.__name__
                raise
            print 'function %s took %.2f sec.' % (f.__name__,elapsed)

cython函数是@Alexander代码的略微修改版本:

# working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html
cimport numpy as np

# Turn bounds checking back on if there are ANY problems!
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a,
                  np.ndarray[np.float64_t, ndim=2] b,
                  np.ndarray[np.float64_t, ndim=1] data,
                  np.ndarray[np.int32_t, ndim=1] rows,
                  np.ndarray[np.int32_t, ndim=1] cols,
                  np.ndarray[np.float64_t, ndim=1] out):

    cdef int n = rows.shape[0]
    cdef int k = a.shape[1]
    cdef int i,j

    cdef double num

    for i in range(n):
        num = 0.0
        for j in range(k):
            num += a[rows[i],j] * b[j,cols[i]]
        out[i] = data[i]*num

我喜欢Numba解决方案既更快,又不需要外部编译步骤。胜利! - Alexander Tronchin-James

1

根据评论中的额外信息,我认为让你感到困惑的是对np.unique的调用。请尝试以下方法:

import numpy as np
import scipy.sparse as sps
from numpy.core.umath_tests import inner1d

n = 100000
x = np.random.rand(n, 10)
theta = np.random.rand(n, 10)
rows = np.arange(n)
cols = np.arange(n)
np.random.shuffle(rows)
np.random.shuffle(cols)


def sparse_multiply(x, theta, rows, cols):
    data = inner1d(x[rows], theta[cols])
    return sps.coo_matrix((data, (rows, cols)),
                          shape=(x.shape[0], theta.shape[0]))

我得到了以下时间:

n = 1000
%timeit sparse_multiply(x, theta, rows, cols)
1000 loops, best of 3: 465 us per loop

n = 10000
%timeit sparse_multiply(x, theta, rows, cols)
100 loops, best of 3: 4.29 ms per loop

n = 100000
%timeit sparse_multiply(x, theta, rows, cols)
10 loops, best of 3: 61.5 ms per loop

当然,当 n = 100 时:
>>> np.allclose(sparse_multiply(x, theta, rows, cols).toarray()[rows, cols],
                x.dot(theta.T)[rows, cols])
>>> True

0

还没有测试过Jaime的答案(再次感谢!),但与此同时,我使用cython实现了另一个可行的答案。

文件sparse_mult_c.pyx:

# working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html
cimport numpy as np

# Turn bounds checking back on if there are ANY problems!
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a,
                  np.ndarray[np.float64_t, ndim=2] b,
                  np.ndarray[np.int32_t, ndim=1] rows,
                  np.ndarray[np.int32_t, ndim=1] cols,
                  np.ndarray[np.float64_t, ndim=1] C ):

    cdef int n = rows.shape[0]
    cdef int k = a.shape[1]
    cdef int i,j

    for i in range(n):
        for j in range(k):
            C[i] += a[rows[i],j] * b[j,cols[i]]

然后按照http://docs.cython.org/src/userguide/tutorial.html进行编译。

接着在我的Python代码中,我包含了以下内容:

def sparse_mult(a, b, coords):
    #a,b are np.ndarrays
    from sparse_mult_c import sparse_mult_c
    rows, cols = coords
    C = np.zeros(rows.shape[0])
    sparse_mult_c(a,b,rows,cols,C)
    return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) )

这个方法完全支持稀疏矩阵,并且比原始的(对我来说内存效率低下的)解决方案运行得更快。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接