调整大小numpy.memmap数组

Question

调整大小numpy.memmap数组

24

我正在使用一堆大的numpy数组，由于这些数组最近开始占用太多内存，我想将它们替换为numpy.memmap实例。问题是，现在和然后我需要调整数组的大小，我更愿意就地执行此操作。这在普通数组上运作得很好，但尝试在memmaps上进行时会抱怨数据可能是共享的，即使禁用了引用检查也无法解决。

a = np.arange(10)
a.resize(20)
a
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

a = np.memmap('bla.bin', dtype=int)
a
>>> memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

a.resize(20, refcheck=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-f1546111a7a1> in <module>()
----> 1 a.resize(20, refcheck=False)

ValueError: cannot resize this array: it does not own its data

调整基础mmap缓冲区的大小完全没有问题。问题在于如何将这些更改反映到数组对象中。我已经看到了这个解决方法，但不幸的是它不能就地调整数组大小。关于调整mmap大小还有一些numpy文档，但显然它并没有起作用，至少在版本1.8.0中是这样。还有其他想法，如何覆盖内置的调整大小检查？

- Michael

我感觉我一定是漏了什么...这段代码在我的电脑上运行良好。你能运行吗？这不是你想做的吗？http://codepad.org/eEWmYBHZ - three_pineapples

@three_pineapples 他想要改变数组的总大小 - 你的代码只是重新调整它的形状。 - ali_m

@ali_m 哦，我明白了。从问题中我没有理解到这一点，但正如我所说，我认为我错过了什么！感谢您的澄清。 - three_pineapples

我现在为此提交了一个错误报告：https://github.com/numpy/numpy/issues/4198 - Michael

你曾经确定过一个好的方法来做这件事吗？ - dpoiesz

很抱歉，@dpoiesz，没有。 - Michael

2个回答

4

如果我没有理解错误，这基本上实现了@wwwslinger的第二种解决方案所做的事情，但无需手动指定新memmap的大小（以位为单位）。

In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))

In [2]: a[3] = 7

In [3]: a
Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])

In [4]: a.flush()

# this will append to the original file as much as is necessary to satisfy
# the new shape requirement, given the specified dtype
In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))

In [6]: new_a
Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]: a[-1] = 10

In [8]: a
Out[8]: memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10])

In [9]: a.flush()

In [11]: new_a
Out[11]: 
memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0])

这种方法在新数组需要比旧数组更大时效果很好，但是我认为这种方法不允许自动截断内存映射文件的大小，如果新数组较小。

手动调整基础大小（如@wwwslinger的回答中所述）似乎允许截断文件，但它不能减少数组的大小。

例如：

# this creates a memory mapped file of 10 * 8 = 80 bytes
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))

In [2]: a[:] = range(1, 11)

In [3]: a.flush()

In [4]: a
Out[4]: memmap([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

# now truncate the file to 40 bytes
In [5]: a.base.resize(5*8)

In [6]: a.flush()

# the array still has the same shape, but the truncated part is all zeros
In [7]: a
Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])

In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))

# you still need to create a new np.memmap to change the size of the array
In [9]: b
Out[9]: memmap([1, 2, 3, 4, 5])

- ali_m

这是一种类似于我之前发布的解决方法的方法。我更喜欢就地解决方案，因为它可以避免我进一步封装对象。无论如何，这可能是我最终不得不接受的。 - Michael

@Michael 如果你还没有，最好向numpy的维护者报告此问题。至少，np.memmap类的文档字符串应该更新，以反映当前无法原地调整内存映射数组的事实。 - ali_m

我还没有，但看起来这个问题没有简单的解决方案，所以我会去尝试。 - Michael

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- wwwslinger · Accepted Answer

问题在于创建数组时标记 OWNDATA 为 False。您可以通过要求在创建数组时将标记设置为 True 来更改此设置：

>>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O'])
>>> a.shape
(10,)
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> a.resize(20, refcheck=False)
>>> a.shape
(20,)

唯一的注意事项是，它可能会创建数组并复制以确保满足要求。

编辑以解决保存问题：

如果您想将重新调整大小的数组保存到磁盘中，可以将memmap保存为.npy格式的文件，并在需要重新打开和使用为memmap时作为numpy.memmap打开：

>>> a[9] = 1
>>> np.save('bla.npy',a)
>>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+')
>>> b
memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

编辑：提供另一种方法：

您可以通过调整基本mmap的大小（存储在uint8格式中的a.base或a._mmap）并“重新加载”memmap来接近所需内容：

>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> a[3] = 7
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.base.resize(20*8)
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])