从缓冲区创建Numpy 2D数组？

Question

从缓冲区创建Numpy 2D数组？

5

我有一个内存映射，其中包含一个二维数组，我想从中制作一个numpy数组。理想情况下，我希望避免复制，因为涉及的数组可能很大。

我的代码如下：

n_bytes = 10000
tagname = "Some Tag from external System"
map = mmap.mmap(-1, n_bytes, tagname)
offsets = [0, 5000]

columns = []
for offset in offsets:
   #type and count vary in the real code, but for this dummy code I simply made them up. But I know the count and type for every column.
   np_type = np.dtype('f4')
   column_data = np.frombuffer(map, np_type, count=500, offset=offset)
   columns.append(column_data)

# this line seems to copy the data, which I would like to avoid
data = np.array(columns).T

- Christian Sauer

你尝试过将整个文件读取为一个大的一维数组，然后将其重新塑形成二维数组吗？ - kennytm

你事先知道你最终数组的大小吗？ - Julien

@kennytm 数据每列可以有不同的数据类型（例如，第一个块是浮点数，第二个是整数），这在缓冲方法中无法表达。 - Christian Sauer

@ Julien Bernu 是的，我知道有多少列、行和字节。 - Christian Sauer

2个回答

1

我很少使用frombuffer，但我认为np.array与传统构建的数组一样适用于这些数组。

每个column_data数组都有自己的数据缓冲区 - 你分配了它的mmap。但是np.array(columns)从列表中读取每个数组的值，并从它们构造一个新的数组，该数组具有自己的数据缓冲区。

我喜欢使用x.__array_interface__查看数据缓冲区位置（以及查看其他关键属性）。将columns的每个元素和data的字典进行比较。

你可以使用连续块从mmap构造2d数组。只需创建1d的frombuffer数组，然后对其进行reshape。甚至transpose也将继续使用该缓冲区（使用F顺序）。切片和视图也使用它。

但是，除非你非常小心，否则很快就会得到将数据放在其他地方的副本。只需使用 data1 = data+1 就可以创建一个新数组，或者使用高级索引 data[[1,3,5],:]。任何连接都是一样的。

从字节串缓冲区获取2个数组：

In [534]: x=np.frombuffer(b'abcdef',np.uint8)
In [535]: y=np.frombuffer(b'ghijkl',np.uint8)

通过将它们连接起来创建一个新的数组。

In [536]: z=np.array((x,y))

In [538]: x.__array_interface__
Out[538]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (6,),
 'strides': None,
 'typestr': '|u1',
 'version': 3}
In [539]: y.__array_interface__['data']
Out[539]: (3013089608, True)
In [540]: z.__array_interface__['data']
Out[540]: (180817384, False)

x,y,z的数据缓冲区位置完全不同

但是重塑后的x数据没有改变

In [541]: x.reshape(2,3).__array_interface__['data']
Out[541]: (3013090040, True)

也不进行二维转置

In [542]: x.reshape(2,3).T.__array_interface__
Out[542]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (3, 2),
 'strides': (1, 3),
 'typestr': '|u1',
 'version': 3}

相同的数据，不同的视图。

In [544]: x
Out[544]: array([ 97,  98,  99, 100, 101, 102], dtype=uint8)
In [545]: x.reshape(2,3).T
Out[545]: 
array([[ 97, 100],
       [ 98, 101],
       [ 99, 102]], dtype=uint8)
In [546]: x.reshape(2,3).T.view('S1')
Out[546]: 
array([[b'a', b'd'],
       [b'b', b'e'],
       [b'c', b'f']], 
      dtype='|S1')

- hpaulj

谢谢你的好回答！你知道当列大小变化时我如何使用frombuffer方法吗？例如，我的第一个块包含f4，但第二个块包含f8 - 在构建2D数组后我需要进行一些重塑操作吗？ - Christian Sauer

结构化数组允许在字段中使用不同的数据类型。但是，在这样的数组中，f4元素将与f8等元素相邻，以records形式而非作为单独的列（f4块，f8块）。我不知道如何混合列和数据类型。 - hpaulj

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Itay Gal · Accepted Answer

假设您有一个字节数组并且知道其维度，答案非常简单。假设您在缓冲区（名为'buff'）中有一张图像的原始RGB数据（每像素24位），尺寸为1024x768。

#read the buffer into 1D byte array
arr = numpy.frombuffer(buff, dtype=numpy.uint8)
#now shape the array as you please
arr.shape = (768,1024,3)