能否使用memmap支持将布尔numpy数组以每个元素1位的形式保存在磁盘上？

Question

能否使用memmap支持将布尔numpy数组以每个元素1位的形式保存在磁盘上？

4

可以将numpy数组以布尔格式保存在磁盘上，每个元素只占用1位吗？这个答案建议使用packbits和unpackbits，但是从文档中看，这种方法可能不支持内存映射。是否有一种方式可以支持内存映射的方式来存储1位数组到磁盘上？

内存映射的原因：我正在使用全高清（1920x1080）图像数据库进行神经网络训练，但是我会随机裁剪出一个256x256的补丁用于每次迭代。由于读取整个图像很耗时，我使用memmap仅读取所需的补丁。现在，我想将二进制掩码与我的图像一起使用，因此需要这个功能。

- Nagabhushan S N

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ken · Accepted Answer

numpy不支持每个元素1位的数组，我怀疑memmap没有这样的功能。但是，可以使用packbits进行简单的解决。

由于您的情况不需要比特随机访问，因此可以将其读取为每个元素1字节的数组。

# A binary mask represented as an 1 byte per element array.
full_size_mask = np.random.randint(0, 2, size=[1920, 1080], dtype=np.uint8)

# Pack mask vertically.
packed_mask = np.packbits(full_size_mask, axis=0)

# Save as a memmap compatible file.
buffer = np.memmap("./temp.bin", mode='w+',
                   dtype=packed_mask.dtype, shape=packed_mask.shape)
buffer[:] = packed_mask
buffer.flush()
del buffer

# Open as a memmap file.
packed_mask = np.memmap("./temp.bin", mode='r',
                        dtype=packed_mask.dtype, shape=packed_mask.shape)

# Rect where you want to crop.
top = 555
left = 777
width = 256
height = 256

# Read the area containing the rect.
packed_top = top // 8
packed_bottom = (top + height) // 8 + 1
packed_patch = packed_mask[packed_top:packed_bottom, left:left + width]

# Unpack and crop the actual area.
patch_top = top - packed_top * 8
patch_mask = np.unpackbits(packed_patch, axis=0)[patch_top:patch_top + height]

# Check that the mask is cropped from the correct area.
print(np.all(patch_mask == full_size_mask[top:top + height, left:left + width]))

请注意，这种解决方案可能会读取额外的位数（很可能会发生）。具体来说，在两端最多为7位。在您的情况下，它将是7x2x256位，但这只占补丁的约5％，因此我认为可以忽略不计。

顺便说一句，这不是您问题的答案，但当您处理二进制掩码（例如图像分割的标签）时，使用zip压缩可以大大减小文件大小。它可能被减少到每个图像不到8 KB（而不是每个补丁）。您可能还想考虑此选项。