NumPy：填充给定边界框坐标内的大型数组值

Question

NumPy：填充给定边界框坐标内的大型数组值

pythonarraysnumpyperformancenumpy-ndarray

4

我有一个非常大的三维数组。

large = np.zeros((2000, 1500, 700))

实际上，large 是一张图片，但对于每个坐标，它有700个值。此外，我有400个边界框。边界框没有固定的形状。我将每个框的下限和上限坐标存储为元组，如下所示。

boxes_y = [(y_lower0, y_upper0), (y_lower1, y_upper1), ..., (y_lower399, y_upper399)]
boxes_x = [(x_lower0, x_upper0), (x_lower1, x_upper1), ..., (x_lower399, x_upper399)]

然后，对于每个方框，我想要用大小为700的向量填充large数组中对应的区域。具体来说，我对于每个方框都有一个embeddings数组。

embeddings = np.random.rand(400, 700) # In real case, these are not random. Just consider the shape

What I want to do is

for i in range(400):
   large[boxes_y[i][0]: boxes_y[i][1], boxes_x[i][0]: boxes_x[i][1]] = embeddings[i]

这个方法可以工作，但对于如此大的large数组来说速度太慢了。我正在寻找向量化这个计算的方法。

- Shadovx

2

切换到更小的数据类型，如np.uint8、np.int16或至少np.float32是否可行？ - dankal444

是的，我可以切换到 np.float32。谢谢...除此之外，我认为没有办法将其向量化。最大的障碍是，我想边界框的形状不固定。我认为在 scipy 中有一些关于图像标记和边界框的方法，但我还没有深入研究过。 - Shadovx

1

你可以尝试首先确定每个像素应该落在哪个边界框中 -> (2000, 1500) 数组，然后使用它来矢量化整个过程。我担心这可能对速度没有太大帮助 - 不过还是值得一试的。 - dankal444

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jérôme Richard · Accepted Answer

一个大问题是输入数据非常巨大（约15.6 GiB）。另一个问题是在最坏情况下需要重复读取多达400次（导致RAM中写入多达6240 GiB），因为有一些重叠区域被写入了多次。更好的解决方案是迭代前两个维度（即“图像”的维度）来查找应该复制哪个边界框，就像@dankal444提出的那样。这类似于计算机图形学中基于Z缓冲的算法。

基于此，一个更好的解决方案是使用扫描线渲染算法。在您的情况下，该算法比传统算法简单得多，因为您正在处理边界框而不是复杂的多边形。对于每个扫描线（这里是2000），您可以快速过滤写入扫描线的边界框，然后迭代它们。对于您的简单情况，经典算法过于复杂。对于每个扫描线，迭代过滤的边界框并覆盖每个像素中的其索引就足够了。这个操作可以使用Numba并行完成。由于计算主要在CPU高速缓存中执行，所以非常快速。

最后一步是根据之前的索引执行实际的数据写入操作（仍然使用Numba并行化）。这个操作仍然受到内存限制，但输出数组仅被写入一次（在最坏情况下只有15.6 GiB的RAM将被写入，对于float32项则为7.8 GiB）。这应该在大多数机器上只需要几秒钟。如果这还不够快，您可以尝试使用专用GPU，因为GPU RAM通常比主RAM快一个数量级。

以下是实现：

# Assume the last dimension of `large` and `embeddings` is contiguous in memory
@nb.njit('void(float32[:,:,::1], float32[:,::1], int_[:,::1], int_[:,::1])', parallel=True)
def fastFill(large, embeddings, boxes_y, boxes_x):
    n, m, l = large.shape
    boxCount = embeddings.shape[0]
    assert embeddings.shape == (boxCount, l)
    assert boxes_y.shape == (boxCount, 2)
    assert boxes_x.shape == (boxCount, 2)
    imageBoxIds = np.full((n, m), -1, dtype=np.int16)
    for y in nb.prange(n):
        # Filtering -- A sort is not required since the number of bounding-box is small
        boxIds = np.where((boxes_y[:,0] <= y) & (y < boxes_y[:,1]))[0]
        for k in boxIds:
            lower, upper = boxes_x[k]
            imageBoxIds[y, lower:upper] = k
    # Actual filling
    for y in nb.prange(n):
        for x in range(m):
            boxId = imageBoxIds[y, x]
            if boxId >= 0:
                large[y, x, :] = embeddings[boxId]

这里是基准测试：

large = np.zeros((1000, 750, 700), dtype=np.float32)  # 8 times smaller in memory
boxes_y = np.cumsum(np.random.randint(0, large.shape[0]//2, size=(400, 2)), axis=1)
boxes_x = np.cumsum(np.random.randint(0, large.shape[1]//2, size=(400, 2)), axis=1)
embeddings = np.random.rand(400, 700).astype(np.float32)

# Called many times
for i in range(400):
   large[boxes_y[i][0]:boxes_y[i][1], boxes_x[i][0]:boxes_x[i][1]] = embeddings[i]

# Called many times
fastFill(large, embeddings, boxes_y, boxes_x)

这是我的机器上的结果：

Initial code:        2.71 s
Numba (sequential):  0.13 s
Numba (parallel):    0.12 s   (x22 times faster than the initial code)

请注意，由于虚拟零映射内存的缘故，第一次运行较慢。在这种情况下，Numba版本仍然快约10倍。