如何更快地在numpy数组中查找唯一的x、y点(去重)?例如:
points = numpy.random.randint(0, 5, (10,2))
我想将点转换为复数,然后检查唯一性,但这似乎过于复杂:
b = numpy.unique(points[:,0] + 1j * points[:,1])
points = numpy.column_stack((b.real, b.imag))
numpy.array(list(set(tuple(p) for p in points)))
如果您需要针对大多数情况的快速解决方案,也许这个配方会对您有所帮助:
http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
set
是一个无序集合,因此排序被破坏了。 - wimimport numpy as np
np.random.seed(1)
points = np.random.randint(0, 5, (10,2))
print(points)
print(len(points))
[[3 4]
[0 1]
[3 0]
[0 1]
[4 4]
[1 2]
[4 2]
[4 3]
[4 2]
[4 2]]
10
cpoints = points.view('c8')
cpoints = np.unique(cpoints)
points = cpoints.view('i4').reshape((-1,2))
print(points)
print(len(points))
[[0 1]
[1 2]
[3 0]
[3 4]
[4 2]
[4 3]
[4 4]]
7
import numpy as np
np.random.seed(1)
N=10000
points = np.random.randint(0, 5, (N,2))
def using_unique():
cpoints = points.view('c8')
cpoints = np.unique(cpoints)
return cpoints.view('i4').reshape((-1,2))
def using_set():
return np.vstack([np.array(u) for u in set([tuple(p) for p in points])])
% python -mtimeit -s'import test' 'test.using_set()'
100 loops, best of 3: 18.3 msec per loop
% python -mtimeit -s'import test' 'test.using_unique()'
10 loops, best of 3: 40.6 msec per loop
numpy.vstack([numpy.array(u) for u in set([tuple(p) for p in points])])
不够快? - wim