如何高效地找到一组点的边界框？

Question

如何高效地找到一组点的边界框？

8

我有一个存储在数组中的点集。我需要找到这些点的边界，即包含所有点的矩形。我知道如何在普通的Python中解决这个问题。

我想知道是否有比朴素的max、min方法或内置方法更好的解决方案。

points = [[1, 3], [2, 4], [4, 1], [3, 3], [1, 6]]
b = bounds(points) # the function I am looking for
# now b = [[1, 1], [4, 6]]

- ryanafrish7

2

你会如何用Python解决这个问题？我们可以试着改进一下。比如：np.min(points,0)和np.max(points,0)怎么样？ - Divakar

1

除非您的数据点已经有某种排序，否则您无法做得比O(n)更好。因此，您最好使用朴素的最小值和最大值方法。 - wim

@Divakar 非常有帮助。 - ryanafrish7

3个回答

5

你不能做得比 O(n) 更好，因为你必须遍历所有点来确定 x 和 y 的 max 和 min。

但是，你可以减少常数因子，只遍历列表一次；然而，不清楚是否会给你更好的执行时间，如果确实有的话，这将是针对大量点的集合。

[编辑]: 实际上并不是，"naive" 方法是最有效的。

以下是 "naive" 方法：（它是最快的方法之一）

def bounding_box_naive(points):
    """returns a list containing the bottom left and the top right 
    points in the sequence
    Here, we use min and max four times over the collection of points
    """
    bot_left_x = min(point[0] for point in points)
    bot_left_y = min(point[1] for point in points)
    top_right_x = max(point[0] for point in points)
    top_right_y = max(point[1] for point in points)

    return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]

而（也许？）不那么天真的人：

def bounding_box(points):
    """returns a list containing the bottom left and the top right 
    points in the sequence
    Here, we traverse the collection of points only once, 
    to find the min and max for x and y
    """
    bot_left_x, bot_left_y = float('inf'), float('inf')
    top_right_x, top_right_y = float('-inf'), float('-inf')
    for x, y in points:
        bot_left_x = min(bot_left_x, x)
        bot_left_y = min(bot_left_y, y)
        top_right_x = max(top_right_x, x)
        top_right_y = max(top_right_y, y)

    return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]

性能分析结果：

import random
points = [(random.randrange(-1000, 1000), random.randrange(-1000, 1000))  for _ in range(1000000)]

%timeit bounding_box_naive(points)
%timeit bounding_box(points)

大小为1,000个点

1000 loops, best of 3: 573 µs per loop
1000 loops, best of 3: 1.46 ms per loop

点数 = 10,000

100 loops, best of 3: 5.7 ms per loop
100 loops, best of 3: 14.7 ms per loop

100,000个数据点的大小

10 loops, best of 3: 66.8 ms per loop
10 loops, best of 3: 141 ms per loop

100万个点的大小

1 loop, best of 3: 664 ms per loop
1 loop, best of 3: 1.47 s per loop

显然，第一个“不那么天真”的方法快了2.5 - 3倍。

- Reblochon Masque

+1，但我很好奇内联三元语句的性能与两个元素的“min”调用相比如何——或者，在情况更大/更小的情况下，只是一个“if:（更新分配）”。 - jedwards

2

每个循环内有4个循环和1个比较，与每个循环内有1个循环和4个比较相比。我认为这只是“移动工作”。如果你真的想要速度，你应该考虑使用numba JIT或类似的东西。 - wim

呵呵，那也是我的猜测，但在你的评论后，我不得不重新测量它。感谢你的推动@wim。（结果已发布在上面） - Reblochon Masque

0

通过使用numpy，尤其是假设将您的点转换为数组后有额外的好处，可以更快地提取边界框。

def bounding_box_numpy(points: np.array):
    """
    Find min/max from an N-collection of coordinate pairs, shape = (N, 2), using 
    numpy's min/max along the collection-axis 
    """
    return [*points.min(axis=0), *points.max(axis=0)]


import random
points = [(random.randrange(-1000, 1000), random.randrange(-1000, 1000))  for _ in range(1000000)]
numpy_points = np.array(points)  # see the comment in the end *)
print(numpy_points.shape)  # prints (1000000, 2)

那么（请参见@Reblochon Masque的早期答案https://dev59.com/1aXja4cB1Zd3GeqPP1hA#46335659）

%timeit bounding_box_naive(points)
%timeit bounding_box(points)
%timeit bounding_box_numpy(np_points)

将返回分析结果

136 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
274 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
20.7 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

*) 公平地说，将点对列表转换为numpy数组需要数百毫秒。

- mjkvaak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cdlane · Accepted Answer

我的提高性能的方法是尽可能将事情推到C级别：

def bounding_box(points):
    x_coordinates, y_coordinates = zip(*points)

    return [(min(x_coordinates), min(y_coordinates)), (max(x_coordinates), max(y_coordinates))]

按照我的（简单）衡量标准，这比@ReblochonMasque的bounding_box_naive()运行速度快大约1.5倍。并且显然更加优雅。 ;-)