虽然已经过去了六年,但这个问题对我很有帮助,所以我进行了速度比较,比较的对象是Divakar、Benjamin、Marcelo Cantos和Curtis Patrick给出的答案。
import numpy as np
vals = np.array([[1,2,3],[4,5,6],[7,8,7],[0,4,5],[2,2,1],[0,0,0],[5,4,3]])
def rows_uniq_elems1(a):
idx = a.argsort(1)
a_sorted = a[np.arange(idx.shape[0])[:,None], idx]
return a[(a_sorted[:,1:] != a_sorted[:,:-1]).all(-1)]
def rows_uniq_elems2(a):
a = (a[:,0] == a[:,1]) | (a[:,1] == a[:,2]) | (a[:,0] == a[:,2])
return np.delete(a, np.where(a), axis=0)
def rows_uniq_elems3(a):
return np.array([v for v in a if len(set(v)) == len(v)])
def rows_uniq_elems4(a):
return np.array([v for v in a if len(np.unique(v)) == len(v)])
结果:
%timeit rows_uniq_elems1(vals)
10000 loops, best of 3: 67.9 µs per loop
%timeit rows_uniq_elems2(vals)
10000 loops, best of 3: 156 µs per loop
%timeit rows_uniq_elems3(vals)
1000 loops, best of 3: 59.5 µs per loop
%timeit rows_uniq_elems(vals)
10000 loops, best of 3: 268 µs per loop
看起来使用set
比numpy.unique
更好。在我的情况下,我需要对一个更大的数组执行此操作:
bigvals = np.random.randint(0,10,3000).reshape([3,1000])
%timeit rows_uniq_elems1(bigvals)
10000 loops, best of 3: 276 µs per loop
%timeit rows_uniq_elems2(bigvals)
10000 loops, best of 3: 192 µs per loop
%timeit rows_uniq_elems3(bigvals)
10000 loops, best of 3: 6.5 ms per loop
%timeit rows_uniq_elems4(bigvals)
10000 loops, best of 3: 35.7 ms per loop
没有列表推导的方法更快。然而,行数是硬编码的,而且很难扩展到超过三列,所以在我的情况下,带有set的列表推导式是最好的答案。
进行了编辑,因为我在bigvals
中混淆了行和列。
for
循环中的O(mn)要快得多。当选项是编译代码和解释代码时,常量很重要。 - Jim Pivarski此解决方案
。 - Divakar