用Python读取复杂的Matlab结构体mat文件

5
我知道mat文件的版本问题,这些版本对应于Python中的不同加载模块,即scipy.ioh5py。我也搜索了许多类似的问题,比如scipy.io.loadmat嵌套结构(即字典)如何在访问Python时保留Matlab结构? 但是当涉及到更复杂的mat文件时,它们都会失败。我的anno_bbox.mat文件结构如下所示:
第一、二层: anno_bbox bbox_test 大小方面: size HOI方面: hoi HOI bboxhuman方面: bboxhuman 当我使用spio.loadmat('anno_bbox.mat', struct_as_record=False, squeeze_me=True)时,它只能将第一层信息作为字典获取。
>>> anno_bbox.keys()
dict_keys(['__header__', '__version__', '__globals__', 'bbox_test', 
'bbox_train', 'list_action'])
>>> bbox_test = anno_bbox['bbox_test']
>>> bbox_test.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'
>>> bbox_test
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab128>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab2b0>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab710>,
   ...,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ec4a8>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ecb00>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622f1198>], dtype=object)

我不知道接下来该做什么。这对我来说太复杂了。文件可以在anno_bbox.mat(8.7MB)中找到。

2个回答

6
我可以帮助您进行翻译。以下是您需要翻译的内容:

我建议您从共享文件开始工作:

使用以下方法加载:

data = io.loadmat('../Downloads/anno_bbox.mat')

我得到:
In [96]: data['bbox_test'].dtype
Out[96]: dtype([('filename', 'O'), ('size', 'O'), ('hoi', 'O')])
In [97]: data['bbox_test'].shape
Out[97]: (1, 9658)

我可以将bbox_test=data['bbox_test']赋值给变量。该变量有9658个记录,每个记录有三个字段,每个字段的数据类型都是对象。

因此,有一个文件名(嵌入在一个元素数组中的字符串)。

In [101]: data['bbox_test'][0,0]['filename']
Out[101]: array(['HICO_test2015_00000001.jpg'], dtype='<U26')

size有3个字段,每个字段中嵌入了3个数字的数组(2d matlab矩阵):

In [102]: data['bbox_test'][0,0]['size']
Out[102]: 
array([[(array([[640]], dtype=uint16), array([[427]], dtype=uint16), array([[3]], dtype=uint8))]],
      dtype=[('width', 'O'), ('height', 'O'), ('depth', 'O')])
In [112]: data['bbox_test'][0,0]['size'][0,0].item()
Out[112]: 
(array([[640]], dtype=uint16),
 array([[427]], dtype=uint16),
 array([[3]], dtype=uint8))

hoi更加复杂:

In [103]: data['bbox_test'][0,0]['hoi']
Out[103]: 
array([[(array([[246]], dtype=uint8), array([[(array([[320]], dtype=uint16), array([[359]], dtype=uint16), array([[306]], dtype=uint16), array([[349]], dtype=uint16)),...
      dtype=[('id', 'O'), ('bboxhuman', 'O'), ('bboxobject', 'O'), ('connection', 'O'), ('invis', 'O')])


In [126]: data['bbox_test'][0,1]['hoi']['id']
Out[126]: 
array([[array([[132]], dtype=uint8), array([[140]], dtype=uint8),
        array([[144]], dtype=uint8)]], dtype=object)
In [130]: data['bbox_test'][0,1]['hoi']['bboxhuman'][0,0]
Out[130]: 
array([[(array([[226]], dtype=uint8), array([[340]], dtype=uint16), array([[18]], dtype=uint8), array([[210]], dtype=uint8))]],
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])

因此,您在MATLAB结构中显示的数据都在那里,以数组的嵌套结构存在(通常是2D(1,1)形状),对象dtype或多个字段。

如果回到并使用squeeze_me加载,则会得到更简单的结果:

In [133]: data['bbox_test'][1]['hoi']['bboxhuman']
Out[133]: 
array([array((226, 340, 18, 210),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((230, 356, 19, 212),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((234, 342, 13, 202),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])],
      dtype=object)

使用struct_as_record='False',我得到了:
In [136]: data['bbox_test'][1]
Out[136]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9748>

查看此rec的属性,我发现可以通过属性名称访问 "fields":

In [137]: rec = data['bbox_test'][1]
In [138]: rec.filename
Out[138]: 'HICO_test2015_00000002.jpg'
In [139]: rec.size
Out[139]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9b38>

In [141]: rec.size.width
Out[141]: 640
In [142]: rec.hoi
Out[142]: 
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9be0>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9e10>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841ee0b8>],
      dtype=object)

In [145]: rec.hoi[1].bboxhuman
Out[145]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9f98>
In [146]: rec.hoi[1].bboxhuman.x1
Out[146]: 230

In [147]: vars(rec.hoi[1].bboxhuman)
Out[147]: 
{'_fieldnames': ['x1', 'x2', 'y1', 'y2'],
 'x1': 230,
 'x2': 356,
 'y1': 19,
 'y2': 212}

等等。


3

我已经对答案进行了更改: https://dev59.com/e2w05IYBdhLWcg3w0VGa#29126361

from scipy.io import loadmat, matlab
def load_mat(filename):
    """
    This function should be called instead of direct scipy.io.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    """

    def _check_vars(d):
        """
        Checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        """
        for key in d:
            if isinstance(d[key], matlab.mio5_params.mat_struct):
                d[key] = _todict(d[key])
            elif isinstance(d[key], np.ndarray):
                d[key] = _toarray(d[key])
        return d

    def _todict(matobj):
        """
        A recursive function which constructs from matobjects nested dictionaries
        """
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, matlab.mio5_params.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _toarray(elem)
            else:
                d[strg] = elem
        return d

    def _toarray(ndarray):
        """
        A recursive function which constructs ndarray from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        """
        if ndarray.dtype != 'float64':
            elem_list = []
            for sub_elem in ndarray:
                if isinstance(sub_elem, matlab.mio5_params.mat_struct):
                    elem_list.append(_todict(sub_elem))
                elif isinstance(sub_elem, np.ndarray):
                    elem_list.append(_toarray(sub_elem))
                else:
                    elem_list.append(sub_elem)
            return np.array(elem_list)
        else:
            return ndarray

    data = loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_vars(data)

如果涉及到结构体的矩阵/单元格,可以通过遍历变量来使其正常工作,并且通过避免遍历不包含结构体的矩阵来提高速度。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接