方法 #1
针对数字数组的矢量化解决方案,利用broadcasting
-
def binary_repr_ar(A, W):
p = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0)).view('u1')
return p.astype('S1').view('S'+str(W)).ravel()
样例执行 -
In [67]: A
Out[67]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [68]: binary_repr_ar(A,32)
Out[68]:
array(['00000000000000000000000000000001',
'00000000000000000000000000000010',
'00000000000000000000000000000011',
'00000000000000000000000000000100',
'00000000000000000000000000000101',
'00000000000000000000000000000110',
'00000000000000000000000000000111',
'00000000000000000000000000001000',
'00000000000000000000000000001001'], dtype='|S32')
方法二
另一种使用数组赋值的向量化方法 -
def binary_repr_ar_v2(A, W):
mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
out = np.full((len(A),W),48, dtype=np.uint8)
out[mask] = 49
return out.view('S'+str(W)).ravel()
或者,直接使用掩码获取字符串数组 -
def binary_repr_ar_v3(A, W):
mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
return (mask+np.array([48],dtype=np.uint8)).view('S'+str(W)).ravel()
请注意,最终输出将是中间输出之一的视图。因此,如果您需要它有自己的内存空间,请简单地附加
.copy()
。
大型输入数组的时间:
In [49]: np.random.seed(0)
...: A = np.random.randint(1,1000,(100000))
...: W = 32
In [50]: %timeit binary_repr_ar(A, W)
...: %timeit binary_repr_ar_v2(A, W)
...: %timeit binary_repr_ar_v3(A, W)
1 loop, best of 3: 854 ms per loop
100 loops, best of 3: 14.5 ms per loop
100 loops, best of 3: 7.33 ms per loop
根据其他发布的解决方案 -
In [22]: %timeit [np.binary_repr(i, width=32) for i in A]
10 loops, best of 3: 97.2 ms per loop
In [23]: %timeit np.frompyfunc(np.binary_repr,2,1)(A,32).astype('U32')
10 loops, best of 3: 80 ms per loop
In [24]: %timeit np.vectorize(np.binary_repr)(A, 32)
10 loops, best of 3: 69.8 ms per loop
在@Paul Panzer的解决方案中 -
In [5]: %timeit bin_rep(A,32)
548 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit bin_rep(A,31)
2.2 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.binary_repr
只能作用于单个值,而不能作用于类似数组的对象。 - Willem Van Onsem