使用Python和NumPy进行2D卷积

Question

使用Python和NumPy进行2D卷积

22

我正在尝试使用numpy在python中执行2D卷积。

我有一个如下的2D数组，其中行使用H_r核，列使用H_c核。

data = np.zeros((nr, nc), dtype=np.float32)

#fill array with some data here then convolve

for r in range(nr):
    data[r,:] = np.convolve(data[r,:], H_r, 'same')

for c in range(nc):
    data[:,c] = np.convolve(data[:,c], H_c, 'same')

data = data.astype(np.uint8);

它没有产生我期望的输出，这段代码看起来没问题，我认为问题在于从float32转换成8位时的强制类型转换。最好的方法是什么？

谢谢

- mikip

输出的方式与您预期的不同在哪里？ - Justin Peel

嗨，这不是Matlab所产生的相同结果。 - mikip

你在Matlab中是如何进行类型转换的？这是四舍五入还是截断的差别吗？ - Justin Peel

更好的2D卷积代码在这个后续问答中提供：https://dev59.com/O1YItIcB2Jgan1znmdA-（不是完全相同的副本）。 - Cris Luengo

11个回答

7

编辑 [2019年1月]

@Tashus的评论是正确的，因此@dudemeister的答案可能更为准确。他建议的函数也更有效率，避免了直接进行二维卷积所需的操作次数。

可能存在的问题

我认为您正在执行两个一维卷积，第一个是按列进行的，第二个是按行进行的，并将第一个结果替换为第二个结果。

请注意，numpy.convolve使用'same'参数返回一个与提供的最大数组形状相同的数组，因此当您进行第一个卷积时，已经填充了整个data数组。

在这些步骤中可视化数组的一种好方法是使用Hinton图表，以便您可以检查哪些元素已经有值。

可能的解决方案

您可以尝试添加两个卷积的结果（在第二个for循环中使用data[:,c] += ..而不是data[:,c] =），如果您的卷积矩阵是使用一维的H_r和H_c矩阵得出的结果，如下所示：

另一种方法是使用scipy.signal.convolve2d和一个二维卷积数组，这可能是您原本想要做的。

- berna1111

1

不是“用第二个结果替换第一个结果”，而是将每一行与水平核卷积，然后将这些结果的每一列与垂直核卷积。这是MATLAB中conv的一种特殊模式。 - Tashus

你说得对，在第二个循环中，每个数组元素已经具有了第一次卷积的结果 - 等效的 H2d 可能在角落处有非空元素，这可能更好... 我意识到这被用于图片模糊滤镜，以避免直接进行 2D 卷积所需的大量操作。那么 @dudemeister 的答案可能是正确的路线。 - berna1111

5

既然您已经将内核分离，您应该简单地使用scipy的sepfir2d函数：

from scipy.signal import sepfir2d
convolved = sepfir2d(data, H_r, H_c)

另一方面，你那里的代码看起来没问题…

- dudemeister

嗨Dudemaster，我认为问题在于我正在使用以下命令将输出转换为8位 data = np.array(data,dtype=np.int8) 这样可以吗？ - mikip

@mikip 在将数字转换为8位之前，它们是否在-128到127的范围内？如果不是，那么这会极大地改变你的输出结果。 - Justin Peel

这真的取决于卷积的实现方式以及您的内核。将您的内核和数据都转换为浮点数或至少int32值可能是值得一试的。请注意，任何体面的8位卷积算法都应该使用（至少）16位临时值，因为在卷积过程中进行求和可能会导致8位值溢出，具体取决于内核。 - dudemeister

3

我查看了许多实现，但没有找到适合我的目的的实现，而我的目的应该非常简单。因此，这里提供了一个非常简单的for循环实现。

def convolution2d(image, kernel, stride, padding):
    image = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)

    kernel_height, kernel_width = kernel.shape
    padded_height, padded_width = image.shape

    output_height = (padded_height - kernel_height) // stride + 1
    output_width = (padded_width - kernel_width) // stride + 1

    new_image = np.zeros((output_height, output_width)).astype(np.float32)

    for y in range(0, output_height):
        for x in range(0, output_width):
            new_image[y][x] = np.sum(image[y * stride:y * stride + kernel_height, x * stride:x * stride + kernel_width] * kernel).astype(np.float32)
    return new_image

- curiouscupcake

2

这可能不是最优化的解决方案，但它比@omotto提出的方案快大约10倍，并且只使用了基本的numpy函数（如reshape、expand_dims、tile...），没有使用'for'循环：

def gen_idx_conv1d(in_size, ker_size):
    """
    Generates a list of indices. This indices correspond to the indices
    of a 1D input tensor on which we would like to apply a 1D convolution.

    For instance, with a 1D input array of size 5 and a kernel of size 3, the
    1D convolution product will successively looks at elements of indices [0,1,2],
    [1,2,3] and [2,3,4] in the input array. In this case, the function idx_conv1d(5,3) 
    outputs the following array: array([0,1,2,1,2,3,2,3,4]).

    args:
        in_size: (type: int) size of the input 1d array.
        ker_size: (type: int) kernel size.

    return:
        idx_list: (type: np.array) list of the successive indices of the 1D input array
        access to the 1D convolution algorithm.

    example:
        >>> gen_idx_conv1d(in_size=5, ker_size=3)
        array([0, 1, 2, 1, 2, 3, 2, 3, 4])
    """
    f = lambda dim1, dim2, axis: np.reshape(np.tile(np.expand_dims(np.arange(dim1),axis),dim2),-1)
    out_size = in_size-ker_size+1
    return f(ker_size, out_size, 0)+f(out_size, ker_size, 1)

def repeat_idx_2d(idx_list, nbof_rep, axis):
    """
    Repeats an array of indices (idx_list) a number of time (nbof_rep) "along" an axis
    (axis). This function helps to browse through a 2d array of size
    (len(idx_list),nbof_rep).

    args:
        idx_list: (type: np.array or list) a 1D array of indices.
        nbof_rep: (type: int) number of repetition.
        axis: (type: int) axis "along" which the repetition will be applied.

    return
        idx_list: (type: np.array) a 1D array of indices of size len(idx_list)*nbof_rep.

    example:
        >>> a = np.array([0, 1, 2])
        >>> repeat_idx_2d(a, 3, 0) # repeats array 'a' 3 times along 'axis' 0
        array([0, 0, 0, 1, 1, 1, 2, 2, 2])

        >>> repeat_idx_2d(a, 3, 1) # repeats array 'a' 3 times along 'axis' 1
        array([0, 1, 2, 0, 1, 2, 0, 1, 2])

        >>> b = np.reshape(np.arange(3*4), (3,4))
        >>> b[repeat_idx_2d(np.arange(3), 4, 0), repeat_idx_2d(np.arange(4), 3, 1)]
        array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
    """
    assert axis in [0,1], "Axis should be equal to 0 or 1."
    tile_axis = (nbof_rep,1) if axis else (1,nbof_rep)
    return np.reshape(np.tile(np.expand_dims(idx_list, 1),tile_axis),-1)

def conv2d(im, ker):
    """
    Performs a 'valid' 2D convolution on an image. The input image may be
    a 2D or a 3D array.

    The output image first two dimensions will be reduced depending on the 
    convolution size. 

    The kernel may be a 2D or 3D array. If 2D, it will be applied on every
    channel of the input image. If 3D, its last dimension must match the
    image one.

    args:
        im: (type: np.array) image (2D or 3D).
        ker: (type: np.array) convolution kernel (2D or 3D).

    returns:
        im: (type: np.array) convolved image.

    example:
        >>> im = np.reshape(np.arange(10*10*3),(10,10,3))/(10*10*3) # 3D image
        >>> ker = np.array([[0,1,0],[-1,0,1],[0,-1,0]]) # 2D kernel
        >>> conv2d(im, ker) # 3D array of shape (8,8,3)
    """
    if len(im.shape)==2: # if the image is a 2D array, it is reshaped by expanding the last dimension
        im = np.expand_dims(im,-1)

    im_x, im_y, im_w = im.shape

    if len(ker.shape)==2: # if the kernel is a 2D array, it is reshaped so it will be applied to all of the image channels
        ker = np.tile(np.expand_dims(ker,-1),[1,1,im_w]) # the same kernel will be applied to all of the channels 

    assert ker.shape[-1]==im.shape[-1], "Kernel and image last dimension must match."

    ker_x = ker.shape[0]
    ker_y = ker.shape[1]

    # shape of the output image
    out_x = im_x - ker_x + 1 
    out_y = im_y - ker_y + 1

    # reshapes the image to (out_x, ker_x, out_y, ker_y, im_w)
    idx_list_x = gen_idx_conv1d(im_x, ker_x) # computes the indices of a 1D conv (cf. idx_conv1d doc)
    idx_list_y = gen_idx_conv1d(im_y, ker_y)

    idx_reshaped_x = repeat_idx_2d(idx_list_x, len(idx_list_y), 0) # repeats the previous indices to be used in 2D (cf. repeat_idx_2d doc)
    idx_reshaped_y = repeat_idx_2d(idx_list_y, len(idx_list_x), 1)

    im_reshaped = np.reshape(im[idx_reshaped_x, idx_reshaped_y, :], [out_x, ker_x, out_y, ker_y, im_w]) # reshapes

    # reshapes the 2D kernel
    ker = np.reshape(ker,[1, ker_x, 1, ker_y, im_w])

    # applies the kernel to the image and reduces the dimension back to the one of original input image
    return np.squeeze(np.sum(im_reshaped*ker, axis=(1,3)))

我尝试添加了很多注释来解释这个方法，但总体思路是将3D输入图像重塑为5D图像，其形状为（输出图像高度，卷积核高度，输出图像宽度，卷积核宽度，输出图像通道数），然后直接使用基本的数组乘法应用卷积核。当然，在执行过程中，这种方法会使用更多的内存（因此图像的大小会被卷积核高度*卷积核宽度乘以），但速度更快。

为了进行这个重塑步骤，我“过度使用”了numpy数组的索引方法，特别是可以将numpy数组作为索引传递到另一个numpy数组中的可能性。

这种方法也可以用于使用基本数学函数重新编码Pytorch或Tensorflow中的2D卷积乘积，但我毫不怀疑地说，它比现有的nn.conv2d运算符慢...

我真的很喜欢只使用numpy基本工具编写这个方法。

- Guillaume Mougeot

1

那个方法很快！而且这个想法很巧妙。基本上，每个像素都会得到它自己的卷积核，乘以周围的像素并相加。因此，它们需要正确地相加，以便盒状模糊可以成为[[28,29,28],[28,29,28],[28,29,28]]，因为与其他例程不同，它们需要完全相加以保持亮度。 - Tatarize

1

我写了一个 convolve_stride，它使用了 numpy.lib.stride_tricks.as_strided。此外，它支持步幅和扩张，并且与顺序 > 2 的张量兼容。

import numpy as np
from numpy.lib.stride_tricks import as_strided
from im2col import im2col

def conv_view(X, F_s, dr, std):
    X_s = np.array(X.shape)
    F_s = np.array(F_s)
    dr = np.array(dr)
    Fd_s = (F_s - 1) * dr + 1
    if np.any(Fd_s > X_s):
        raise ValueError('(Dilated) filter size must be smaller than X')
    std = np.array(std)
    X_ss = np.array(X.strides)
    Xn_s = (X_s - Fd_s) // std + 1
    Xv_s = np.append(Xn_s, F_s)
    Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
    return as_strided(X, Xv_s, Xv_ss, writeable=False)

def convolve_stride(X, F, dr=None, std=None):
    if dr is None:
        dr = np.ones(X.ndim, dtype=int)
    if std is None:
        std = np.ones(X.ndim, dtype=int)
    if not (X.ndim == F.ndim == len(dr) == len(std)):
        raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
    Xv = conv_view(X, F.shape, dr, std)
    return np.tensordot(Xv, F, axes=X.ndim)

%timeit -n 100 -r 10 convolve_stride(A, F)
#31.2 ms ± 1.31 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)

- lovetl2002

1

只使用基本的numpy，实现超级简单和快速的卷积：

import numpy as np

def conv2d(image, kernel):
    # apply kernel to image, return image of the same shape
    # assume both image and kernel are 2D arrays
    # kernel = np.flipud(np.fliplr(kernel))  # optionally flip the kernel
    k = kernel.shape[0]
    width = k//2
    # place the image inside a frame to compensate for the kernel overlap
    a = framed(image, width)
    b = np.zeros(image.shape)  # fill the output array with zeros; do not use np.empty()
    # shift the image around each pixel, multiply by the corresponding kernel value and accumulate the results
    for p, dp, r, dr in [(i, i + image.shape[0], j, j + image.shape[1]) for i in range(k) for j in range(k)]:
        b += a[p:dp, r:dr] * kernel[p, r]
    # or just write two nested for loops if you prefer
    # np.clip(b, 0, 255, out=b)  # optionally clip values exceeding the limits
    return b

def framed(image, width):
    a = np.zeros((image.shape[0]+2*width, image.shape[1]+2*width))
    a[width:-width, width:-width] = image
    # alternatively fill the frame with ones or copy border pixels
    return a

运行它：

Image.fromarray(conv2d(image, kernel).astype('uint8'))

不要沿着图像滑动内核并逐像素计算变换，而是创建一系列与内核中的每个元素对应的图像移位版本，并将相应的内核值应用于每个移位图像版本。

这可能是您只使用基本的numpy可以获得的最快速度；速度已经可以与scipy convolve2d的C实现相媲美，比fftconvolve更好。这个想法类似于@Tatarize。此示例仅适用于一个颜色分量；对于RGB，请为每个颜色分量重复（或相应修改算法）。

- isCzech

1

其中最明显的一种方法是硬编码内核。

img = img.convert('L')
a = np.array(img)
out = np.zeros([a.shape[0]-2, a.shape[1]-2], dtype='float')
out += a[:-2, :-2]
out += a[1:-1, :-2]
out += a[2:, :-2]
out += a[:-2, 1:-1]
out += a[1:-1,1:-1]
out += a[2:, 1:-1]
out += a[:-2, 2:]
out += a[1:-1, 2:]
out += a[2:, 2:]
out /= 9.0
out = out.astype('uint8')
img = Image.fromarray(out)

这个示例完全展开了3x3的盒状模糊。您可以在不同值处乘以值并将它们除以不同的量。但是，如果您真的想要最快且最脏的方法，那就是它。我认为它比Guillaume Mougeot的方法快了5倍左右。他的方法比其他方法快了10倍左右。

如果您正在执行像高斯模糊之类的操作并需要乘以一些内容，则可能会失去一些步骤。

- Tatarize

0

通常来说，“Convolution 2D”是个名字不当的称呼。实际上，在底层，正在进行的是两个矩阵的相关运算，而非卷积。

pad == same 返回的输出尺寸与输入尺寸相同。

它也可以处理非对称图像。为了对一批2D矩阵执行相关（深度学习术语中的卷积）操作，可以遍历所有通道，计算每个通道片段与相应滤波器片段的相关性。

例如：如果图像的大小为（28,28,3），滤波器大小为（5,5,3），则从图像通道中取出3个片段，使用上述自定义函数执行交叉相关，并将结果矩阵堆叠在输出的相应维度中。

def get_cross_corr_2d(W, X, pad = 'valid'):

   if(pad == 'same'):
       pr = int((W.shape[0] - 1)/2)
       pc = int((W.shape[1] - 1)/2)
       conv_2d = np.zeros((X.shape[0], X.shape[1]))
       X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
       X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
       for r in range(conv_2d.shape[0]):
           for c in range(conv_2d.shape[1]):
               conv_2d[r,c] = np.sum(np.inner(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
       return conv_2d
    
   else:    
       pr = W.shape[0] - 1
       pc = W.shape[1] - 1
       conv_2d = np.zeros((X.shape[0] - W.shape[0] + 2*pr + 1,
                           X.shape[1] - W.shape[1] + 2*pc + 1))
       X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
       X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
       for r in range(conv_2d.shape[0]):
           for c in range(conv_2d.shape[1]):
               conv_2d[r,c] = np.sum(np.multiply(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
       return conv_2d

- Black_Hat

0

尝试先四舍五入，然后转换为uint8：

data = data.round().astype(np.uint8);

- Ruslan Grokhovetsky

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- omotto · Accepted Answer

也许这不是最优化的解决方案，但这是我之前在Python中使用numpy库实现的一种方式：

def convolution2d(image, kernel, bias):
    m, n = kernel.shape
    if (m == n):
        y, x = image.shape
        y = y - m + 1
        x = x - m + 1
        new_image = np.zeros((y,x))
        for i in range(y):
            for j in range(x):
                new_image[i][j] = np.sum(image[i:i+m, j:j+m]*kernel) + bias
    return new_image

希望这段代码能够帮助到有同样疑惑的其他人。

祝好。