这里是使用Numpythonic方法和广播的方式:
In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
Out[83]:
array([[1, 1, 2],
[1, 1, 3]])
这是一个与其他答案相关的性能测试:
In [90]: def cal_diff(A, B):
....: A_rows = A.view([('', A.dtype)] * A.shape[1])
....: B_rows = B.view([('', B.dtype)] * B.shape[1])
....: return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
....:
In [93]: %timeit cal_diff(A, B)
10000 loops, best of 3: 54.1 µs per loop
In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100000 loops, best of 3: 9.41 µs per loop
In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100000 loops, best of 3: 7.41 µs per loop
如果你想要更快的方法,你应该寻找能减少比较次数的方法。在这种情况下(不考虑顺序),你可以从你的行中生成一个唯一的数字,并比较这些数字,这可以通过将项目的平方求和来完成。
这是使用Divakar的in1d方法的基准测试:
In [144]: def in1d_approach(A,B):
.....: dims = np.maximum(B.max(0),A.max(0))+1
.....: return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
.....: np.ravel_multi_index(B.T,dims))]
.....:
In [146]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 23.8 µs per loop
In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 20.2 µs per loop
你可以使用np.diff
来获得一个与顺序无关的结果:
In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])
In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
Out[195]:
array([[1, 1, 2],
[1, 1, 3]])
In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 30.7 µs per loop
使用Divakar的设置进行基准测试:
In [198]: B = np.random.randint(0,9,(1000,3))
In [199]: A = np.random.randint(0,9,(100,3))
In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
In [202]: A[A_idx] = B[B_idx]
In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 137 µs per loop
In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 112 µs per loop
In [205]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 115 µs per loop
使用更大的数组进行计时(Divakar的解决方案略微更快):
In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
1000 loops, best of 3: 1.01 ms per loop
In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
1000 loops, best of 3: 880 µs per loop
In [233]: %timeit in1d_approach(A, B)
1000 loops, best of 3: 807 µs per loop
[i for i in A for j in B if i==j]
- JRodDynamite