使用NumPy中的binary_repr处理数字数组或替代方法 - Python

Question

使用NumPy中的binary_repr处理数字数组或替代方法 - Python

5

使用以下代码，我正在尝试将一组数字转换为二进制数，但出现错误。

import numpy as np

lis=np.array([1,2,3,4,5,6,7,8,9])
a=np.binary_repr(lis,width=32)

运行程序后的错误如下：

``` Traceback (most recent call last): File "", line 4, in a=np.binary_repr(lis,width=32) File "C:\Users.......", in binary_repr if num == 0: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

有没有解决此问题的方法？

- Zewo

np.binary_repr 只能作用于单个值，而不能作用于类似数组的对象。 - Willem Van Onsem

我知道，但有没有办法让它在数组中工作呢？ - Zewo

还是有没有Python代码可以将整个数组或列表转换为32位二进制的呢？ - Zewo

你是想要获取字符串输出还是其他什么？ - Divakar

5个回答

3

作为关于 binary_repr 文档的说明所述：

num : int

只能使用整数十进制数。

但是，您可以像这样进行向量化操作：

np.vectorize(np.binary_repr)(lis, 32)

这随后给我们带来了：

>>> np.vectorize(np.binary_repr)(lis, 32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')

今日免费次数已满, 请开通会员/明日再来

binary_repr_vector = np.vectorize(np.binary_repr)
binary_repr_vector(lis, 32)

当然，这样做会得到相同的结果：

>>> binary_repr_vector = np.vectorize(np.binary_repr)
>>> binary_repr_vector(lis, 32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')

- Willem Van Onsem

3

方法 #1

针对数字数组的矢量化解决方案，利用broadcasting -

def binary_repr_ar(A, W):
    p = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0)).view('u1')
    return p.astype('S1').view('S'+str(W)).ravel()

样例执行 -

In [67]: A
Out[67]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [68]: binary_repr_ar(A,32)
Out[68]: 
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='|S32')

方法二

另一种使用数组赋值的向量化方法 -

def binary_repr_ar_v2(A, W):
    mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
    out = np.full((len(A),W),48, dtype=np.uint8)
    out[mask] = 49
    return out.view('S'+str(W)).ravel()

或者，直接使用掩码获取字符串数组 -

def binary_repr_ar_v3(A, W):
    mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
    return (mask+np.array([48],dtype=np.uint8)).view('S'+str(W)).ravel()

请注意，最终输出将是中间输出之一的视图。因此，如果您需要它有自己的内存空间，请简单地附加 .copy()。

大型输入数组的时间：

In [49]: np.random.seed(0)
    ...: A = np.random.randint(1,1000,(100000))
    ...: W = 32

In [50]: %timeit binary_repr_ar(A, W)
    ...: %timeit binary_repr_ar_v2(A, W)
    ...: %timeit binary_repr_ar_v3(A, W)
1 loop, best of 3: 854 ms per loop
100 loops, best of 3: 14.5 ms per loop
100 loops, best of 3: 7.33 ms per loop

根据其他发布的解决方案 -

In [22]: %timeit [np.binary_repr(i, width=32) for i in A]
10 loops, best of 3: 97.2 ms per loop

In [23]: %timeit np.frompyfunc(np.binary_repr,2,1)(A,32).astype('U32')
10 loops, best of 3: 80 ms per loop

In [24]: %timeit np.vectorize(np.binary_repr)(A, 32)
10 loops, best of 3: 69.8 ms per loop

在@Paul Panzer的解决方案中 -

In [5]: %timeit bin_rep(A,32)
548 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit bin_rep(A,31)
2.2 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

- Divakar

感谢提供时间计算。 - Paul Panzer

2

这里有一种使用np.unpackbits的快速方法。

(np.unpackbits(lis.astype('>u4').view(np.uint8))+ord('0')).view('S32')
# array([b'00000000000000000000000000000001',
#        b'00000000000000000000000000000010',
#        b'00000000000000000000000000000011',
#        b'00000000000000000000000000000100',
#        b'00000000000000000000000000000101',
#        b'00000000000000000000000000000110',
#        b'00000000000000000000000000000111',
#        b'00000000000000000000000000001000',
#        b'00000000000000000000000000001001'], dtype='|S32')

更为通用：

def bin_rep(A,n):
    if n in (8,16,32,64):
        return (np.unpackbits(A.astype(f'>u{n>>3}').view(np.uint8))+ord('0')).view(f'S{n}')
    nb = max((n-1).bit_length()-3,0)
    return (np.unpackbits(A.astype(f'>u{1<<nb}')[...,None].view(np.uint8),axis=1)[...,-n:]+ord('0')).ravel().view(f'S{n}')

注意：特别处理n = 8、16、32、64是非常值得的，因为对于这些数字，可以使速度提高数倍。

另外请注意，此方法最大只能达到2^64，更大的整数需要采用不同的方法。

- Paul Panzer

它是否适用于通用的窗口长度？数字值的范围是否受限制？ - Divakar

@Divakar 嗯，需要一些工作；-) 而且显然在 uint64 之外会变得棘手。 - Paul Panzer

看起来很有前途，显然是一个聪明人把这些放在一边了。 - Divakar

1

@Divakar 我添加了一个通用版本。虽然没有尝试处理大于等于2^64的整数，但是对于非对齐大小来说速度要慢得多，但仍然相当快。 - Paul Panzer

1

In [193]: alist = [1,2,3,4,5,6,7,8,9]

np.vectorize 很方便，但不够快：

In [194]: np.vectorize(np.binary_repr)(alist, 32)                                                            
Out[194]: 
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
        ....
       '00000000000000000000000000001001'], dtype='<U32')
In [195]: timeit np.vectorize(np.binary_repr)(alist, 32)                                                     
71.8 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

普通的列表推导式更好：

In [196]: [np.binary_repr(i, width=32) for i in alist]                                                       
Out[196]: 
['00000000000000000000000000000001',
 '00000000000000000000000000000010',
 '00000000000000000000000000000011',
...
 '00000000000000000000000000001001']
In [197]: timeit [np.binary_repr(i, width=32) for i in alist]                                                
11.5 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

另一个迭代器：

In [200]: timeit np.frompyfunc(np.binary_repr,2,1)(alist,32).astype('U32')                                   
30.1 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

- hpaulj

对于小数组来说，确实是这样的。但对于更大的数组，比如lis = np.arange(1000)，我们可以看到列表推导通常需要两倍的时间。而frompyfunc则需要大约多10%的时间。这并不奇怪，因为numpy绝对不适用于少量数据，只有在批量处理数据时才能得到回报。 - Willem Van Onsem

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- taras · Accepted Answer

你可以使用 np.vectorize 来解决这个问题。

>>> lis=np.array([1,2,3,4,5,6,7,8,9])
>>> a=np.binary_repr(lis,width=32)
>>> binary_repr_vec = np.vectorize(np.binary_repr)
>>> binary_repr_vec(lis, width=32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')