使用pandas读取hdf5数据集

Question

使用pandas读取hdf5数据集

5

我试图使用Pandas打开一个没有分组的HDF5文件：

import pandas as pd
foo = pd.read_hdf('foo.hdf5')

但是我遇到了一个错误：

类型错误：如果对象不存在或未传递任何值，则无法创建存储器

我试着通过分配一个key来解决这个问题：

foo = pd.read_hdf('foo.hdf5','key')

如果key是一个组，那么这将起作用，但该文件没有组，而是在最高的HDF结构中有几个数据集。即工作文件的结构为：组->数据集，而不起作用的文件的结构为：数据集。两者在使用h5py打开时都能正常工作，在这种情况下，我会使用：

f = h5py.File('foo.hdf5','r')

并且

dset = f['dataset']

如何在pandas中读取数据集？

- hsnee

如果您尝试执行以下操作：df = pd.read_hdf('foo.hdf5', 'dataset')，会发生什么？ - MaxU - stand with Ukraine

可能相关：Pandas无法读取使用h5py创建的hdf5文件 - unutbu

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MaxU - stand with Ukraine · Accepted Answer

我想你可能会被不同的术语所困扰 - Pandas的HDF存储中的key是一个完整的路径，即Group + DataSet_name...

示例：

In [67]: store = pd.HDFStore(r'D:\temp\.data\hdf\test.h5')

In [68]: store.append('dataset1', df)

In [69]: store.append('/group1/sub_group1/dataset2', df)

In [70]: store.groups
Out[70]:
<bound method HDFStore.groups of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [71]: store.items
Out[71]:
<bound method HDFStore.items of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [72]: store.close()

In [73]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', 'dataset1')

In [74]: x.shape
Out[74]: (9, 2)

In [75]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', '/group1/sub_group1/dataset2')

In [76]: x.shape
Out[76]: (9, 2)