使用平均值调整大小或重新调整numpy 2d数组的大小

40

我正在尝试在Python中重新实现IDL函数:

http://star.pst.qub.ac.uk/idl/REBIN.html

该函数通过平均值将二维数组按整数因子缩小。

例如:

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

我想通过取相关样本的平均值将其调整为(2,3),预期输出结果为:

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

b [0,0] = np.mean(a [:2,:2]),b [0,1] = np.mean(a [:2,2:4])等等。

我认为我应该将其重塑为四维数组,然后在正确的切片上取平均值,但是我无法找出算法。你能给我一点提示吗?


1
刚刚发现这是一个重复的问题,链接为https://dev59.com/hm445IYBdhLWcg3w7unD,但是我之前在stackoverflow中使用搜索功能时没有找到。 - Andrea Zonca
5个回答

45

这是一个基于你链接的答案的例子(为了清晰起见):

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

作为一个函数:

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

1
谢谢,我已经在GitHub上创建了一个Gist,其中包含这个函数的实现,以防其他人需要:https://gist.github.com/1348792,我还建议在`numpy-discussion`上将其添加到`numpy`中,但答案是否定的。 - Andrea Zonca
他们给出了否定答案的原因吗? - K.-Michael Aye
我认为这个链接是相关讨论。看起来并不是很消极,只是缺乏时间或者兴趣不够。 - user707650
请记住,对包含NaN的数据进行平均值计算将返回NaN。因此,如果您想要忽略任何NaN值的平均值,您需要使用nanmean()。仍然是一个很好的答案。 - timbo
这是一个关于重新分组N维数组的概括 https://dev59.com/jZffa4cB1Zd3GeqP9o5M#73078468 - divenex

18

J.F. Sebastian有一个非常好的关于二维分箱的答案。这里是他的“重新分箱”函数的一个版本,适用于N个维度:

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

7

以下是使用矩阵乘法来完成您所需的操作的方法,不需要新的数组维度能够整除旧的数组维度。

首先,我们生成一行压缩矩阵和一列压缩矩阵(我相信有更简洁的方法来完成这个操作,甚至可能只使用numpy操作):

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

...所以,例如get_row_compressor(5, 3)会给你:

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

get_column_compressor(3, 2) 则会给你以下结果:

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

然后只需将行压缩器预乘以矩阵并将列压缩器后乘以矩阵即可得到压缩后的矩阵。
def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

使用这种技术,
compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

产出:

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

1
这太棒了,它甚至可以在新形状不是原始形状的倍数时工作(这是其他解决方案存在的问题)。 - ru111

3
我试图对光栅进行降采样——将大约6000×2000大小的光栅转换为任意大小的较小光栅,使其在先前的bin大小上正确平均值。我找到了使用SciPy的解决方案,但是我无法在共享托管服务上安装SciPy,所以我编写了这个函数。可能有更好的方法来完成这个任务,而不涉及遍历行和列,但这似乎可以工作。
好处是旧的行数和列数不必被新的行数和列数整除。
def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a 

并非总是有效,例如:resize_array(np.random.uniform(size=(12961, 1)), 50, 1)(会出错) - Ferus

1

我在使用MarcTheSpark的答案时遇到了问题,虽然它在大多数情况下都很有效,但对于一些特定的输出形状却无法胜任。我不得不更改get_row_compressor函数中第一个条件中round()函数的值。

如果我的声望足够高,我本应该只评论这个问题。

我还添加了一个重新调整1D数组的代码片段。

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while (
        which_row < (dim_compressor.shape[0]) and which_column < (dim_compressor.shape[1])
    ):
        if round(next_bin_break - which_column, 1) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()


def rebin(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return (
        np.mat(get_row_compressor(array.shape[0], new_shape[0]))
        * np.mat(array)
        * np.mat(get_column_compressor(array.shape[1], new_shape[1]))
    )

def rebin_1d(array, new_len):
    array_t = array.reshape((1, len(array)))
    array_rebinned = rebin(array_t, (1, new_len))
    return np.squeeze(np.asarray(array_rebinned))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接