请问是否可以使用scipy.io.loadmat在Python中加载Matlab表格?
我的操作如下:
在Matlab中:
tab = table((1:500)')
save('tab.mat', 'tab')
在Python中:
import scipy.io
mat = scipy.io.loadmat('m:/tab.mat')
但是我无法使用 mat['tab'] 在Python中访问表格选项卡。
loadmat
函数无法加载MATLAB表。可以通过一个小技巧解决问题,将表保存为.csv
文件,然后使用pandas
读取。writetable(table_name, file_name)
在Python中
df = pd.read_csv(file_name)
df
将包含 table_name
的内容。根据Jochen的答案,我提出了一个不同的变体,对我来说做得很好。 我编写了一个Matlab脚本来自动准备m文件(请参见我的GitLab Repositroy,其中包含示例)。 它执行以下操作:
在Matlab中为类table:
与Jochen的示例相同,但将数据绑定在一起。因此,加载多个变量更容易。名称“table”和“columns”是下一部分的必需品。
YourVariableName = struct('table', struct(TableYouWantToLoad), 'columns', {struct(TableYouWantToLoad).varDim.labels})
save('YourFileName', 'YourVariableName')
另外,如果您需要处理旧的数据集类型。
YourVariableName = struct('table', struct(DatasetYouWantToLoad), 'columns', {get(DatasetYouWantToLoad,'VarNames')})
save('YourFileName', 'YourVariableName')
import scipy.io as sio
mdata = sio.loadmat('YourFileName')
mtable = load_table_from_struct(mdata['YourVariableName'])
使用
import pandas as pd
def load_table_from_struct(table_structure) -> pd.DataFrame():
# get prepared data structure
data = table_structure[0, 0]['table']['data']
# get prepared column names
data_cols = [name[0] for name in table_structure[0, 0]['columns'][0]]
# create dict out of original table
table_dict = {}
for colidx in range(len(data_cols)):
table_dict[data_cols[colidx]] = [val[0] for val in data[0, 0][0, colidx]]
return pd.DataFrame(table_dict)
它与文件的加载无关,但基本上是Jochen代码的最小化版本。因此,请为他的帖子点赞。
我已经为我正在进行的一个项目调查了这个问题,作为一种解决方法,您可以尝试以下操作。
在MATLAB中,首先将@table对象转换为struct,并使用以下代码检索列名:
table_struct = struct(table_object);
table_columns = table_struct.varDim.labels;
save table_as_struct table_struct table_columns;
然后您可以尝试在Python中使用以下代码:
import numpy
import pandas as pd
import scipy.io
# function to load table variable from MAT-file
def loadtablefrommat(matfilename, tablevarname, columnnamesvarname):
"""
read a struct-ified table variable (and column names) from a MAT-file
and return pandas.DataFrame object.
"""
# load file
mat = scipy.io.loadmat(matfilename)
# get table (struct) variable
tvar = mat.get(tablevarname)
data_desc = mat.get(columnnamesvarname)
types = tvar.dtype
fieldnames = types.names
# extract data (from table struct)
data = None
for idx in range(len(fieldnames)):
if fieldnames[idx] == 'data':
data = tvar[0][0][idx]
break;
# get number of columns and rows
numcols = data.shape[1]
numrows = data[0, 0].shape[0]
# and get column headers as a list (array)
data_cols = []
for idx in range(numcols):
data_cols.append(data_desc[0, idx][0])
# create dict out of original table
table_dict = {}
for colidx in range(numcols):
rowvals = []
for rowidx in range(numrows):
rowval = data[0,colidx][rowidx][0]
if type(rowval) == numpy.ndarray and rowval.size > 0:
rowvals.append(rowval[0])
else:
rowvals.append(rowval)
table_dict[data_cols[colidx]] = rowvals
return pd.DataFrame(table_dict)
正如其他人所提到的,目前这是不可能的,因为Matlab没有记录这个文件格式。人们正在尝试反向工程文件格式,但这还在进行中。
一个解决方法是将表格写入CSV格式,并使用Python加载它。表格中的条目可以是可变长度数组,这些数组将分割成编号列。我编写了一个简短的函数,从这个CSV文件中加载标量和数组。
writetable(table_name, filename)
在Python中读取CSV文件:
def load_matlab_csv(filename):
"""Read CSV written by matlab tablewrite into DataFrames
Each entry in the table can be a scalar or a variable length array.
If it is a variable length array, then Matlab generates a set of
columns, long enough to hold the longest array. These columns have
the variable name with an index appended.
This function infers which entries are scalars and which are arrays.
Arrays are grouped together and sorted by their index.
Returns: scalar_df, array_df
scalar_df : DataFrame of scalar values from the table
array_df : DataFrame with MultiIndex on columns
The first level is the array name
The second level is the index within that array
"""
# Read the CSV file
tdf = pandas.read_table(filename, sep=',')
cols = list(tdf.columns)
# Figure out which columns correspond to scalars and which to arrays
scalar_cols = [] # scalar column names
arr_cols = [] # array column names, without index
arrname2idxs = {} # dict of array column name to list of integer indices
arrname2colnames = {} # dict of array column name to list of full names
# Iterate over columns
for col in cols:
# If the name ends in "_" plus space plus digits, it's probably
# from an array
if col[-1] in '0123456789' and '_' in col:
# Array col
# Infer the array name and index
colsplit = col.split('_')
arr_idx = int(colsplit[-1])
arr_name = '_'.join(colsplit[:-1])
# Store
if arr_name in arrname2idxs:
arrname2idxs[arr_name].append(arr_idx)
arrname2colnames[arr_name].append(col)
else:
arrname2idxs[arr_name] = [arr_idx]
arrname2colnames[arr_name] = [col]
arr_cols.append(arr_name)
else:
# Scalar col
scalar_cols.append(col)
# Extract all scalar columns
scalar_df = tdf[scalar_cols]
# Extract each set of array columns into its own dataframe
array_df_d = {}
for arrname in arr_cols:
adf = tdf[arrname2colnames[arrname]].copy()
adf.columns = arrname2idxs[arrname]
array_df_d[arrname] = adf
# Concatenate array dataframes
array_df = pandas.concat(array_df_d, axis=1)
return scalar_df, array_df
scalar_df, array_df = load_matlab_csv(filename)
mat
是什么类型的Python变量 - 是否有任何数据(而不仅仅是分配的字段)?或者loadmat
完全无法处理表格格式? - Schorschscipy.io.whosmat('m:/tab.mat')
命令会得到什么结果?(这个想法来自于这里) - Schorschtable
是否适用于 'Read .mat files in Python'? - Schorsch