通过求和来聚合Numpy数组

4

我有一个形状为(4320,8640)的Numpy数组。我想要一个形状为(2160,4320)的数组。

你会注意到新数组的每个单元格都映射到旧数组中的一个2x2单元格集。我希望新数组中单元格的值是旧数组中这个块中值的总和。

我可以通过以下方式实现:

import numpy

#Generate an example array
arr = numpy.random.randint(10,size=(4320,8640))

#Perform the transformation
arrtrans = numpy.array([ [ arr[y][x]+arr[y+1][x]+arr[y][x+1]+arr[y+1][x+1] for x in range(0,8640,2)] for y in range(0,4320,2)])

但这样做很慢,而且有点丑陋。

是否有一种使用Numpy(或可互操作的软件包)来完成此操作的方法?

3个回答

8

当窗口恰好适合数组时,使用numpy进行重塑以增加维度并使用np.sum折叠额外的维度是一种标准的方法:

>>> a = np.random.rand(4320,8640)
>>> a.shape
(4320, 8640)
>>> a_small = a.reshape(2160, 2, 4320, 2).sum(axis=(1, 3))
>>> a_small.shape
(2160, 4320)
>>> np.allclose(a_small[100, 203], a[200:202, 406:408].sum())
True

非常酷,这与使用as_strided操纵步幅具有相同的性能。您可以使用np.einsum('ijkl -> ik', a.reshape(2160, 2, 4320, 2))获得更快的速度。 - wwii
np.einsum的优势通常来自于求和和乘积操作的向量化(SIMD)。我认为最新版本的numpy在np.sum中也具有相同的特性。 - Jaime

1

我不确定是否有您想要的包,但这段代码将计算速度更快。

>>> arrtrans2 = arr[::2, ::2] + arr[::2, 1::2] + arr[1::2, ::2] + arr[1::2, 1::2]
>>> numpy.allclose(arrtrans, arrtrans2)
True

::21::2分别由0, 2, 4, ...1, 3, 5, ...翻译。


1
你正在操作原始数组的滑动窗口。关于滑动窗口、numpy和python,SO上有很多问题和答案。通过操纵数组的步幅,这个过程可以大大加快。这是一个通用函数,它将返回具有或不具有重叠的数组的(x,y)窗口。使用这个步幅技巧似乎只比@mskimm的解决方案慢一点点。这是一个很好的工具,值得拥有。这个函数不是我的 - 它是在Efficient Overlapping Windows with Numpy中发现的。
import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
from itertools import product

def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple, 
    even for one-dimensional shapes.

    Parameters
        shape - an int, or a tuple of ints

    Returns
        a shape tuple

    from http://www.johnvinyard.com/blog/?p=268
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions

    Parameters:
        a  - an n-dimensional numpy array
        ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
             of each dimension of the window
        ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
             amount to slide the window in each dimension. If not specified, it
             defaults to ws.
        flatten - if True, all slices are flattened, otherwise, there is an 
                  extra dimension for each dimension of the input.

    Returns
        an array containing each n-dimensional window from a

    from http://www.johnvinyard.com/blog/?p=268
    '''

    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)

    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once.
    ws = np.array(ws)
    ss = np.array(ss)
    shape = np.array(a.shape)


    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shape),len(ws),len(ss)]
    if 1 != len(set(ls)):
        error_string = 'a.shape, ws and ss must all have the same length. They were{}'
        raise ValueError(error_string.format(str(ls)))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shape):
        error_string = 'ws cannot be larger than a in any dimension. a.shape was {} and ws was {}'
        raise ValueError(error_string.format(str(a.shape),str(ws)))

    # how many slices will there be in each dimension?
    newshape = norm_shape(((shape - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    strided = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return strided

    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    dim = filter(lambda i : i != 1,dim)
    return strided.reshape(dim)

使用方法:

# 2x2 windows with NO overlap
b = sliding_window(arr, (2,2), flatten = False)
c = b.sum((1,2))

使用numpy.einsum可以获得约24%的性能提升。

c = np.einsum('ijkl -> ij', b)

一个SO问答示例 如何高效地像Matlab的blkproc(blockproc)函数一样分块处理numpy数组,选择的答案会对你有帮助。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接