PyTables写入速度比h5py快，为什么？

Question

PyTables写入速度比h5py快，为什么？

13

我注意到使用h5py库写入.h5文件比使用pytables库要慢得多。这是为什么？即使数组的形状已知，情况也是如此。此外，我使用相同的块大小和无压缩过滤器。

以下是脚本：

import h5py
import tables
import numpy as np
from time import time

dim1, dim2 = 64, 1527416

# append columns
print("PYTABLES: append columns")
print("=" * 32)
f = tables.open_file("/tmp/test.h5", "w")
a = f.create_earray(f.root, "time_data", tables.Float32Atom(), shape=(0, dim1))
t1 = time()
zeros = np.zeros((1, dim1), dtype="float32")
for i in range(dim2):
    a.append(zeros)
tcre = round(time() - t1, 3)
thcre = round(dim1 * dim2 * 4 / (tcre * 1024 * 1024), 1)
print("Time to append %d columns: %s sec (%s MB/s)" % (i+1, tcre, thcre))
print("=" * 32)
chunkshape = a.chunkshape
f.close()

print("H5PY: append columns")
print("=" * 32)
f = h5py.File(name="/tmp/test.h5",mode='w')
a = f.create_dataset(name='time_data',shape=(0, dim1),
                     maxshape=(None,dim1),dtype='f',chunks=chunkshape)
t1 = time()
zeros = np.zeros((1, dim1), dtype="float32")
samplesWritten = 0
for i in range(dim2):
    a.resize((samplesWritten+1, dim1))
    a[samplesWritten:(samplesWritten+1),:] = zeros
    samplesWritten += 1
tcre = round(time() - t1, 3)
thcre = round(dim1 * dim2 * 4 / (tcre * 1024 * 1024), 1)
print("Time to append %d columns: %s sec (%s MB/s)" % (i+1, tcre, thcre))
print("=" * 32)
f.close()

我的电脑上的返回结果：

PYTABLES: append columns
================================
Time to append 1527416 columns: 22.679 sec (16.4 MB/s)
================================
H5PY: append columns
================================
Time to append 1527416 columns: 158.894 sec (2.3 MB/s)
================================

如果我在每个for循环之后都执行flush操作，例如：

for i in range(dim2):
    a.append(zeros)
    f.flush()

我得到：

PYTABLES: append columns
================================
Time to append 1527416 columns: 67.481 sec (5.5 MB/s)
================================
H5PY: append columns
================================
Time to append 1527416 columns: 193.644 sec (1.9 MB/s)
================================

- adku1173

这很可能是由于非常高的库开销所致。如果您一次写入256,64个块，您应该能够达到>150 MB/s。 - max9111

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kcw78 · Accepted Answer

这是一个有趣的比较，涉及 PyTables 和 h5py 写入性能。通常我使用它们来读取 HDF5 文件（通常涉及几次大型数据集的读取），因此没有注意到这种差异。我的想法与 @max9111 相一致：随着写操作数的减少和所写数据集的大小增加，性能应该得到改善。为此，我重新设计了你的代码，使用更少的循环将 N 行数据写入。（代码在末尾）
结果令人惊讶（对我来说）。主要发现：
1.写入所有数据所需的总时间是循环次数的线性函数（对于 PyTables 和 h5py 都是如此）。 2.PyTables 和 h5py 之间的性能差异随着数据集 I/O 大小的增加而只略微改善。 3.Pytables 每次写入 1 行数据（1,527,416 次写入）时速度快了 5.4 倍，每次写入 88 行数据（17,357 次写入）时速度快了 3.5 倍。

下面是一个比较性能的图表。
表格中的数据与图表相对应。

此外，我注意到你的代码注释说“添加列”，但你是在扩展第一维（HDF5 表/数据集的行）。我重新编写了你的代码，以测试扩展第二维（向 HDF5 文件添加列）的性能，结果发现性能非常相似。

最初我认为 I/O 瓶颈是由于调整数据集大小造成的。因此，我重写了这个示例，最初将数组大小调整为容纳所有行。这并没有改善性能（而且显著降低了 h5py 的性能）。 这非常令人惊讶。不确定要怎么处理。

这是我的示例。它使用 3 个变量来调整数组大小（在添加数据时）：

cdim：列数（固定）
row_loops：写入循环次数
block_size：每个循环写入的数据块大小
row_loops*block_size = 写入的总行数

我还对添加的值进行了小改动，将其从 0 改为 1（以验证数据是否已写入），并将其移到顶部（并不计算在时间循环中）。

我的代码如下：

import h5py
import tables
import numpy as np
from time import time

cdim, block_size, row_loops = 64, 4, 381854 
vals = np.ones((block_size, cdim), dtype="float32")

# append rows
print("PYTABLES: append rows: %d blocks with: %d rows" % (row_loops, block_size))
print("=" * 32)
f = tables.open_file("rowapp_test_tb.h5", "w")
a = f.create_earray(f.root, "time_data", atom=tables.Float32Atom(), shape=(0, cdim))
t1 = time()
for i in range(row_loops):
    a.append(vals)
tcre = round(time() - t1, 3)
thcre = round(cdim * block_size * row_loops * 4 / (tcre * 1024 * 1024), 1)
print("Time to append %d rows: %s sec (%s MB/s)" % (block_size * row_loops, tcre, thcre))
print("=" * 32)
chunkshape = a.chunkshape
f.close()

print("H5PY: append rows %d blocks with: %d rows" % (row_loops, block_size))
print("=" * 32)
f = h5py.File(name="rowapp_test_h5.h5",mode='w')
a = f.create_dataset(name='time_data',shape=(0, cdim),
                     maxshape=(block_size*row_loops,cdim),
                     dtype='f',chunks=chunkshape)
t1 = time()
samplesWritten = 0
for i in range(row_loops):
    a.resize(((i+1)*block_size, cdim))
    a[samplesWritten:samplesWritten+block_size] = vals
    samplesWritten += block_size
tcre = round(time() - t1, 3)
thcre = round(cdim * block_size * row_loops * 4 / (tcre * 1024 * 1024), 1)
print("Time to append %d rows: %s sec (%s MB/s)" % (block_size * row_loops, tcre, thcre))
print("=" * 32)
f.close()