两个稀疏矩阵的逐元素最大值

Question

两个稀疏矩阵的逐元素最大值

12

有没有一种简单/内置的方法可以获取两个（或更多）稀疏矩阵的逐元素最大值？即，np.maximum 的稀疏等效形式。

- Maarten

“element-wise” 是什么意思？如果我进入稀疏的coo_matrix页面，我会看到像 arcsin() Element-wise arcsin. 这样的函数。但是没有 max。您想要每个矩阵中的最大值；沿某个维度的最大值；还是跨矩阵集合的最大值？ - hpaulj

3

不冒犯，但我认为逐元素操作是非常明确的。输入：两个具有相同维度的矩阵A、B。输出：一个矩阵C，其中C[i,j] = max(A[i,j], B[i,j])。 - Maarten

7个回答

3

没有内置的方法可以在scipy.sparse中实现此操作。简单的解决方案是：

np.maximum(X.A, Y.A)

但是，当矩阵的维度很大时，这显然会消耗大量内存并可能导致计算机崩溃。一种内存高效（但并不快速）的解决方案是

# convert to COO, if necessary
X = X.tocoo()
Y = Y.tocoo()

Xdict = dict(((i, j), v) for i, j, v in zip(X.row, X.col, X.data))
Ydict = dict(((i, j), v) for i, j, v in zip(Y.row, Y.col, Y.data))

keys = list(set(Xdict.iterkeys()).union(Ydict.iterkeys()))

XmaxY = [max(Xdict.get((i, j), 0), Ydict.get((i, j), 0)) for i, j in keys]
XmaxY = coo_matrix((XmaxY, zip(*keys)))

请注意，这里使用的是纯Python而不是向量化习语。您可以尝试通过向量化部分内容来缩短运行时间。

- Fred Foo

也许我应该提一下之前我尝试过这个，但因为磁盘空间不足而导致我的 MacBook 崩溃了。 - Maarten

1

这里是另一种内存效率更高的解决方案，比larsmans的解决方案更快一些。它基于使用Jaime在此处提供的优秀答案中的代码，找到两个数组中非零元素的唯一索引集合。

import numpy as np
from scipy import sparse

def sparsemax(X, Y):

    # the indices of all non-zero elements in both arrays
    idx = np.hstack((X.nonzero(), Y.nonzero()))

    # find the set of unique non-zero indices
    idx = tuple(unique_rows(idx.T).T)

    # take the element-wise max over only these indices
    X[idx] = np.maximum(X[idx].A, Y[idx].A)

    return X

def unique_rows(a):
    void_type = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    b = np.ascontiguousarray(a).view(void_type)
    idx = np.unique(b, return_index=True)[1]
    return a[idx]

测试：

def setup(n=1000, fmt='csr'):
    return sparse.rand(n, n, format=fmt), sparse.rand(n, n, format=fmt)

X, Y = setup()
Z = sparsemax(X, Y)
print np.all(Z.A == np.maximum(X.A, Y.A))
# True

%%timeit X, Y = setup()
sparsemax(X, Y)
# 100 loops, best of 3: 4.92 ms per loop

- ali_m

1

啊，算了吧 - Maarten 的解决方案比我的快多了！ - ali_m

1

最新的scipy（13.0）为稀疏矩阵定义了逐元素布尔值。因此：

BisBigger = B>A
A - A.multiply(BisBigger) + B.multiply(BisBigger)

np.maximum目前还不能使用，因为它使用了np.where，而np.where仍在尝试获取数组的真值。

有趣的是，B>A返回布尔类型，而B>=A返回float64类型。

- hpaulj

很遗憾，scipy 13.0还没有在PyPi上。 - Maarten

我不确定 B>=A 是什么意思。但是如果它能够做你期望的事情，并且你有两个非常稀疏的矩阵 A 和 B，那么你将得到一个非常密集的矩阵作为结果。 - Maarten

好的观点。它解释了我收到的警告：“SparseEfficiencyWarning：使用>=和<=比较稀疏矩阵效率低下，请尝试改用<，>或！=。” - hpaulj

1

这是一个返回两个稀疏矩阵逐元素最大值的函数。它实现了hpaulj的答案。

def sparse_max(A, B):
    """
    Return the element-wise maximum of sparse matrices `A` and `B`.
    """
    AgtB = (A > B).astype(int)
    M = AgtB.multiply(A - B) + B
    return M

测试：

A = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5))) 
B = sparse.csr_matrix(np.random.randint(-9,10, 25).reshape((5,5)))

M = sparse_max(A, B)
M2 = sparse_max(B, A)

# Test symmetry:
print((M.A == M2.A).all())
# Test that M is larger or equal to A and B, element-wise:
print((M.A >= A.A).all())
print((M.A >= B.A).all())

- hsxavier

0

from scipy import sparse
from numpy import array
I = array([0,3,1,0])
J = array([0,3,1,2])
V = array([4,5,7,9])
A = sparse.coo_matrix((V,(I,J)),shape=(4,4))

A.data.max()
9

如果你还没有尝试过ipython，那么你可以通过创建一个稀疏矩阵A，然后输入A.再按tab键来查看可以在A上调用的方法列表，这样可以节省时间。从中你会发现A.data会将非零条目作为数组返回，因此你只需要找到其中的最大值即可。

- Greg

谢谢你的回答，但是你误解了问题。我编辑过以使其更加清晰明了。 - Maarten

你的意思是逐个元素比较矩阵A和B，找出哪一个拥有更大的最大元素？ - Greg

那么，除非你的数组非常大，否则我会将它们转换为数组 'A.toarray()' 并使用这种方法。如果它们非常大并且导致计算机崩溃，那我也无能为力，抱歉。 - Greg

这个答案无法获取两个矩阵的逐元素最大值，即使对于一个矩阵的最大值也是不正确的：如果.data的最大值是负数怎么办？ - Fred Foo

0

在当前的SciPy中，您可以使用对象方法maximum()：

mM = mA.maximum(mB)

- Royi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Maarten · Accepted Answer

这个方法很有效：

def maximum (A, B):
    BisBigger = A-B
    BisBigger.data = np.where(BisBigger.data < 0, 1, 0)
    return A - A.multiply(BisBigger) + B.multiply(BisBigger)