将numpy bincount向量化

Question

将numpy bincount向量化

8

我有一个二维numpy数组A，我想对矩阵A的每一列应用np.bincount() ，以生成另一个二维数组B，该数组由原始矩阵A的每列的bincounts组成。

我的问题是np.bincount()是一个接受1d数组形式的函数。它不像B = A.max(axis=1)这样是一个数组方法。

是否有更pythonic / numpythic的方法来生成这个B数组，而不是一个丑陋的for循环？

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

for x in range(A.shape[1]):
    B[:,x] =  np.bincount(A[:,x])

- user3556757

4个回答

2

我建议使用np.apply_along_axis，它可以让你将1D方法（在这种情况下是np.bincount）应用于更高维数组的1D切片：

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

B = np.apply_along_axis(np.bincount, axis=0, arr=A)

但是你需要小心。这个（以及你提出的for循环）只有在np.bincount的输出具有正确形状时才有效。如果你的数组A中一个或多个列中不存在最大状态，则输出将没有更小的维度，因此，代码将会出现ValueError。

- jotasi

1

请注意，apply_along_axis 只是一个 for 循环的语法糖，并具有相同的性能特征。 - Eelco Hoogendoorn

2

这个方案使用numpy_indexed包（免责声明：我是其作者）完全向量化，因此在幕后不包含任何Python循环。另外，输入中没有限制；并不是每一列都需要包含相同的唯一值集合。

import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())

这提供了一种另类（稀疏）表示相同结果的方法，如果B数组确实包含许多零，则这种方法可能更加合适。

(bin, col), count = npi.count((A.flatten(), colidx.flatten()))

请注意，apply_along_axis只是一个for循环的语法糖，并具有相同的性能特征。

- Eelco Hoogendoorn

1

另一种可能性：

import numpy as np


def bincount_columns(x, minlength=None):
    nbins = x.max() + 1
    if minlength is not None:
        nbins = max(nbins, minlength)
    ncols = x.shape[1]
    count = np.zeros((nbins, ncols), dtype=int)
    colidx = np.arange(ncols)[None, :]
    np.add.at(count, (x, colidx), 1)
    return count

For example,

In [110]: x
Out[110]: 
array([[4, 2, 2, 3],
       [4, 3, 4, 4],
       [4, 3, 4, 4],
       [0, 2, 4, 0],
       [4, 1, 2, 1],
       [4, 2, 4, 3]])

In [111]: bincount_columns(x)
Out[111]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2]])

In [112]: bincount_columns(x, minlength=7)
Out[112]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

- Warren Weckesser

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

使用与这篇文章相同的思路，下面是一种矢量化方法 -

m = A.shape[1]    
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T