我有一个存储在数组中的点集。我需要找到这些点的边界,即包含所有点的矩形。我知道如何在普通的Python中解决这个问题。
我想知道是否有比朴素的max、min方法或内置方法更好的解决方案。
points = [[1, 3], [2, 4], [4, 1], [3, 3], [1, 6]]
b = bounds(points) # the function I am looking for
# now b = [[1, 1], [4, 6]]
我有一个存储在数组中的点集。我需要找到这些点的边界,即包含所有点的矩形。我知道如何在普通的Python中解决这个问题。
我想知道是否有比朴素的max、min方法或内置方法更好的解决方案。
points = [[1, 3], [2, 4], [4, 1], [3, 3], [1, 6]]
b = bounds(points) # the function I am looking for
# now b = [[1, 1], [4, 6]]
我的提高性能的方法是尽可能将事情推到C级别:
def bounding_box(points):
x_coordinates, y_coordinates = zip(*points)
return [(min(x_coordinates), min(y_coordinates)), (max(x_coordinates), max(y_coordinates))]
按照我的(简单)衡量标准,这比@ReblochonMasque的bounding_box_naive()
运行速度快大约1.5倍。并且显然更加优雅。 ;-)
你不能做得比 O(n)
更好,因为你必须遍历所有点来确定 x
和 y
的 max
和 min
。
但是,你可以减少常数因子,只遍历列表一次;然而,不清楚是否会给你更好的执行时间,如果确实有的话,这将是针对大量点的集合。
[编辑]: 实际上并不是,"naive" 方法是最有效的。
def bounding_box_naive(points):
"""returns a list containing the bottom left and the top right
points in the sequence
Here, we use min and max four times over the collection of points
"""
bot_left_x = min(point[0] for point in points)
bot_left_y = min(point[1] for point in points)
top_right_x = max(point[0] for point in points)
top_right_y = max(point[1] for point in points)
return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]
def bounding_box(points):
"""returns a list containing the bottom left and the top right
points in the sequence
Here, we traverse the collection of points only once,
to find the min and max for x and y
"""
bot_left_x, bot_left_y = float('inf'), float('inf')
top_right_x, top_right_y = float('-inf'), float('-inf')
for x, y in points:
bot_left_x = min(bot_left_x, x)
bot_left_y = min(bot_left_y, y)
top_right_x = max(top_right_x, x)
top_right_y = max(top_right_y, y)
return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]
import random
points = [(random.randrange(-1000, 1000), random.randrange(-1000, 1000)) for _ in range(1000000)]
%timeit bounding_box_naive(points)
%timeit bounding_box(points)
1000 loops, best of 3: 573 µs per loop
1000 loops, best of 3: 1.46 ms per loop
100 loops, best of 3: 5.7 ms per loop
100 loops, best of 3: 14.7 ms per loop
10 loops, best of 3: 66.8 ms per loop
10 loops, best of 3: 141 ms per loop
1 loop, best of 3: 664 ms per loop
1 loop, best of 3: 1.47 s per loop
显然,第一个“不那么天真”的方法快了2.5 - 3
倍。
通过使用numpy,尤其是假设将您的点转换为数组后有额外的好处,可以更快地提取边界框。
def bounding_box_numpy(points: np.array):
"""
Find min/max from an N-collection of coordinate pairs, shape = (N, 2), using
numpy's min/max along the collection-axis
"""
return [*points.min(axis=0), *points.max(axis=0)]
import random
points = [(random.randrange(-1000, 1000), random.randrange(-1000, 1000)) for _ in range(1000000)]
numpy_points = np.array(points) # see the comment in the end *)
print(numpy_points.shape) # prints (1000000, 2)
那么(请参见@Reblochon Masque的早期答案https://dev59.com/1aXja4cB1Zd3GeqPP1hA#46335659)
%timeit bounding_box_naive(points)
%timeit bounding_box(points)
%timeit bounding_box_numpy(np_points)
将返回分析结果
136 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
274 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
20.7 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
*) 公平地说,将点对列表转换为numpy数组需要数百毫秒。
np.min(points,0)
和np.max(points,0)
怎么样? - Divakar