在numpy图像中高效处理像素和邻域

Question

在numpy图像中高效处理像素和邻域

4

我有一个场景的深度图像。我遍历该图像并计算检测窗口下深度的平均变化量。检测窗口的大小根据当前位置周围像素的平均深度而改变。我累积平均变化来生成一个简单的响应图像。

在我的机器上，对于512x52的图像，大部分时间都花费在for循环中，大约40秒左右。我希望能加快速度。是否有更有效/更快的遍历图像的方法？是否有更好的Pythonic/Numpy/Scipy方法来访问每个像素？还是我应该去学习Cython？

编辑：通过使用scipy.misc.imread()而不是skimage.io.imread()，我将运行时间缩短到约18秒。不确定两者之间的区别，我将尝试进行调查。

下面是代码的简化版本：

import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time

def intersect(a, b):
    '''Determine the intersection of two rectangles'''
    rect = (0,0,0,0)
    r0 = max(a[0],b[0])
    c0 = max(a[1],b[1])
    r1 = min(a[2],b[2])
    c1 = min(a[3],b[3])
    # Do we have a valid intersection?
    if r1 > r0 and  c1 > c0: 
         rect = (r0,c0,r1,c1)
    return rect

# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src)   # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0)  # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with

# Image to accumulate response
out_img = np.zeros(depth_src.shape)

# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044)  # width, height

start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
    # Find points around current pixel      
    r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)

    # Calculate average of depth of points around current pixel
    scale =  integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 

    # Based on average depth, create the detection window
    r0 = r - (model[0] * scale/2)
    c0 = c - (model[1] * scale/2)
    r1 = r + (model[0] * scale/2)
    c1 = c + (model[1] * scale/2)

    # Used scale optimised detection window to extract features
    r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
    depth_count = integrate(depth_pts,r0,c0,r1,c1)
    if depth_count:
         depth_sum = integrate(depth_intg,r0,c0,r1,c1)
         avg_change = depth_sum / depth_count
         # Accumulate response
         out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"

plt.imshow(out_img)
plt.gray()
plt.show()

- Michael

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alex I · Accepted Answer

迈克尔，有趣的问题。你遇到的主要性能问题似乎是图像中每个像素都有两个计算它的 integrate() 函数，一个大小为 3x3，另一个大小不确定。以这种方式计算单个积分非常低效，无论使用什么 numpy 函数，这是一个算法问题，而不是实现问题。考虑一个大小为 NN 的图像。您可以仅使用约 4*NN 操作计算该图像中任何大小为 KK 的所有积分，而不是（如人们可能天真地期望的）NNKK。方法是首先在每行中计算窗口 K 上的滑动总和图像，然后在结果中的每列上滑动总和。将每个滑动总和更新为移动到下一个像素仅需要添加当前窗口中最新的像素并减去前一个窗口中最旧的像素，因此每个像素仅需要两次操作，无论窗口大小如何。我们必须这样做两次（对于行和列），因此每个像素需要 4 次操作。

我不确定 numpy 中是否有内置的滑动窗口总和，但是这个答案提供了一些使用步幅技巧的方法：https://dev59.com/CGnWa4cB1Zd3GeqPxidV#12713297。您当然可以通过一次循环列和一次循环行（使用切片提取行/列）来完成相同的操作。

示例：

# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
    if i > K:
        row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
    elif i > 1:
        row_sums[i,:] = row_sums[i-1,:] + img[i,:]
    else: # i == 0
        row_sums[i,:] = img[i,:]

col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
    if j > K:
        col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
    elif j > 1:
        col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
    else: # j == 0
        col_sums[:,j] = row_sums[:,j]

# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored

你如何将其应用于你的情况？我认为你可能想要预先计算3x3（平均深度）和几个更大尺寸的积分，然后使用3x3的值来选择一个更大的检测窗口（假设我理解你的算法意图）。您需要的更大尺寸范围可能是有限的，或者人工限制它仍然可以很好地工作，只需选择最近的尺寸即可。使用滑动求和一次计算所有积分要高效得多，因此我几乎可以确定，即使在特定像素处永远不会使用某些尺寸，对于许多尺寸进行计算也是值得的，特别是如果其中一些尺寸很大。

附言：这是一个小补充，但您可能希望避免为每个像素调用intersect()：（a）仅处理距离边缘比最大积分大小更远的像素，或者（b）在所有侧面上添加最大积分大小的图像边缘，填充边缘要么是零，要么是nans，或者（c）（最佳方法）使用切片自动处理此问题：超出ndarray边界的切片索引会自动限制到边界，当然，负索引会被包装。

编辑：添加滑动窗口求和示例