如何在Python中读取Mat v7.3文件？

Question

如何在Python中读取Mat v7.3文件？

6

我想读取以下网站中提供的mat文件，ufldl.stanford.edu/housenumbers，在train.tar.gz文件中，有一个名为digitStruct.mat的mat文件。

当我使用scipy.io来读取mat文件时，它向我发出警告：“请使用hdf reader读取matlab v7.3文件”。

原始的matlab文件如下所示：

load digitStruct.mat
for i = 1:length(digitStruct)
    im = imread([digitStruct(i).name]);
    for j = 1:length(digitStruct(i).bbox)
        [height, width] = size(im);
        aa = max(digitStruct(i).bbox(j).top+1,1);
        bb = min(digitStruct(i).bbox(j).top+digitStruct(i).bbox(j).height, height);
        cc = max(digitStruct(i).bbox(j).left+1,1);
        dd = min(digitStruct(i).bbox(j).left+digitStruct(i).bbox(j).width, width);

        imshow(im(aa:bb, cc:dd, :));
        fprintf('%d\n',digitStruct(i).bbox(j).label );
        pause;
    end
end

如上所示，mat文件中有一个名为“digitStruct”的键，而在“digitStruct”中可以找到键“name”和“bbox”，我使用h5py API读取该文件。

import h5py
f = h5py.File('train.mat')
print len( f['digitStruct']['name'] ), len(f['digitStruct']['bbox']   )

我可以读取数组，但是当我循环遍历数组时，如何读取每个项目？

for i in f['digitStruct']['name']:
    print i # only print out the HDF5 ref

- user824624

嘿，你有解决方案吗？我也遇到了同样的问题。谢谢 - Ramesh Kumar

2个回答

1

import numpy as np
import cPickle as pickle
import h5py

f = h5py.File('train/digitStruct.mat')

metadata= {}
metadata['height'] = []
metadata['label'] = []
metadata['left'] = []
metadata['top'] = []
metadata['width'] = []

def print_attrs(name, obj):
    vals = []
        if obj.shape[0] == 1:
            vals.append(int(obj[0][0]))
        else:
            for k in range(obj.shape[0]):
                vals.append(int(f[obj[k][0]][0][0]))
        metadata[name].append(vals)

for item in f['/digitStruct/bbox']:
    f[item[0]].visititems(print_attrs)

with open('train_metadata.pickle','wb') as pf:
  pickle.dump(metadata, pf, pickle.HIGHEST_PROTOCOL)

我是从https://discussions.udacity.com/t/how-to-deal-with-mat-files/160657/3修改而来。老实说，我无法确切地理解visititmes()的作用。HDF5文件的层次结构和抽象度太高了。

这个元数据是一个字典。每个键的内容都是一个嵌入式数组。该数组有33402项，对应于按顺序命名的png文件。每个项都是一个长度为1~6的数组。我计算了不同数字的数量，分别是5137、18130、8691、1434、9、1。

令我惊讶的是pickle文件只有9MB，比mat文件小20多倍。我猜HDS文件为了层次结构牺牲了存储空间。

注：为了裁剪图像，我将值转换为整数。现在train_metadata.pickle文件只有2MB，是mat文件的100倍。

- Yuchao Jiang

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Franck Dernoncourt · Accepted Answer

使用MATLAB编写：

test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'};
save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py

enter image description here

Python中的读取（适用于任何行或列的数字，但假设每个单元格都是字符串）：

import h5py
import numpy as np

data = []
with h5py.File("data.mat") as f:
    for column in f['test']:
        row_data = []
        for row_number in range(len(column)):            
            row_data.append(''.join(map(unichr, f[column[row_number]][:])))   
        data.append(row_data)

print data
print np.transpose(data)

输出：

[[u'Hello', u'Good', u'See'], [u'world!', u'morning', u'you!']]

[[u'Hello' u'world!']
 [u'Good' u'morning']
 [u'See' u'you!']]