如何在Python中读取HDF5文件

Question

如何在Python中读取HDF5文件

134

我正试图在Python中从hdf5文件中读取数据。使用h5py，我可以读取hdf5文件，但我无法弄清如何访问文件中的数据。

我的代码

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')

这样可以工作，文件也已经被读取。但是我该如何访问文件对象f1中的数据？

- Sameer Damir

3

如果文件中保存了Keras模型，您可能希望使用Keras进行加载，而不是其他方式。 - Josiah Yoder

2

一个 hdf5 文件和一个 hdf 文件有什么不同？我有一些 hdf 文件（它们是几个图像波段），但我无法弄清如何打开它们。 - mikey

df = numpy.read_hdf(fileName.hdf5) -> 这将数据存储到一个numpy数据帧中，您可以使用它。 - Tanmoy

13个回答

39

读取文件

import h5py

f = h5py.File(file_name, mode)

通过打印现有的HDF5组来研究文件的结构

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset

提取数据

#Get the HDF5 group; key needs to be a group name from above
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

# This assumes group[some_key_inside_the_group] is a dataset, 
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data

#After you are done
f.close()

- Daksh

5

了解所有变量使用的确切结构：data.visit(print)。 - Hitesh

只是提醒一下，h5py.File(...) 中的 f 应该大写。 - dannykim

1

@dannykim 完成。 - Daksh

2

重要提示：在结尾处需要使用 data.close()。 - anilbey

1

它应该是（可怕的新语法）：data = group[some_key_inside_the_group][()]。 - Bersan

显示剩余2条评论

28

你可以使用Pandas。

import pandas as pd
pd.read_hdf(filename,key)

- Danny

8

除非您正在存储数据帧，否则不应依赖Pandas实现。 read_hdf依赖于HDF文件的特定结构; 另外没有pd.write_hdf，因此只能单向使用它。请参见此文章。 - Max

5

Pandas有写入函数。请查看pd.DataFrame.to_hdf。 - Eric Taw

9

这是我刚写的一个简单函数，它可以读取由Keras中的save_weights函数生成的.hdf5文件，并返回一个包含层名称和权重的字典：

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

这里提供的代码与it技术有关，虽然我没有进行全面测试，但可以胜任我的工作。

- Attila

这个函数似乎显示了.h5文件中的所有内容。谢谢。 - minTwin

7

要将.hdf5文件的内容读取为数组，您可以按照以下方式进行操作：

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

- Raza

6

使用以下代码来读取数据并将其转换为numpy数组。

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

将数据集值读入numpy数组的首选方法：

import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
    print(list(f1.keys()))  # print list of root level objects
    # following assumes 'x' and 'y' are dataset objects
    ds_x1 = f1['x']  # returns h5py dataset object for 'x'
    ds_y1 = f1['y']  # returns h5py dataset object for 'y'
    arr_x1 = f1['x'][()]  # returns np.array for 'x'
    arr_y1 = f1['y'][()]  # returns np.array for 'y'
    arr_x1 = ds_x1[()]  # uses dataset object to get np.array for 'x'
    arr_y1 = ds_y1[()]  # uses dataset object to get np.array for 'y'
    print (arr_x1.shape)
    print (arr_y1.shape)

- ashish bansal

1

不要忘记关闭文件，否则文件可能会损坏。 - anilbey

谢谢。这可能是打开.hdf5数据文件的最佳方式。 - Farzad Amirjavid

2

如果您在hdf文件中有命名数据集，那么可以使用以下代码将这些数据集读取并转换为numpy数组：

import h5py
file = h5py.File('filename.h5', 'r')

xdata = file.get('xdata')
xdata= np.array(xdata)

如果您的文件在不同的目录中，您可以在'filename.h5'前面添加路径。

- Machzx

2

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

- Judice

2

这是我们在Keras中加载保存的NN模型的方法。我认为这个问题更普遍，与Keras无关。 - Upul Bandara

3

当你手头只有一把锤子时，所有的东西看起来都像是钉子 :-). - Upul Bandara

0

使用这个，对我来说很好用。


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key][()]
    return weights

print(read_hdf5())

如果您正在使用h5py<='2.9.0'，那么您可以使用以下代码：


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key].value
    return weights

print(read_hdf5())

- Zaeem Asghar

0

你需要做的是创建一个数据集。如果你查看快速入门指南，它会告诉你需要使用文件对象来创建数据集。所以，f.create_dataset 然后你就可以读取数据了。这在文档中有详细解释。

- Games Brainiac

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martin Thoma · Accepted Answer

读取 HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # Print all root level object names (aka keys) 
    # these can be group or dataset names 
    print("Keys: %s" % f.keys())
    # get first object name/key; may or may NOT be a group
    a_group_key = list(f.keys())[0]

    # get the object type for a_group_key: usually group or dataset
    print(type(f[a_group_key])) 

    # If a_group_key is a group name, 
    # this gets the object names in the group and returns as a list
    data = list(f[a_group_key])

    # If a_group_key is a dataset name, 
    # this gets the dataset values and returns as a list
    data = list(f[a_group_key])
    # preferred methods to get dataset values:
    ds_obj = f[a_group_key]      # returns as a h5py dataset object
    ds_arr = f[a_group_key][()]  # returns as a numpy array

编写HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("dataset_name", data=data_matrix)

查看h5py文档以获取更多信息。

备选方案

JSON：适合编写易于阅读的数据；非常常用的格式（读和写）
CSV：超级简单的格式（读和写）
pickle：Python序列化格式（读和写）
MessagePack（Python包）：更紧凑的表示（读和写）
HDF5（Python包）：适用于矩阵（读和写）
XML：也存在*叹息*（读和写）

对于您的应用程序，以下可能很重要：

其他编程语言的支持
读取/写入性能
紧凑性（文件大小）

另请参见：数据序列化格式比较

如果您更希望找到一种创建配置文件的方法，您可能想阅读我的短文Python中的配置文件