Python中genfromtxt()函数的列数可以是可变的吗？

Question

Python中genfromtxt()函数的列数可以是可变的吗？

5

我有一个包含不同长度行的.txt文件，每一行都是表示一条轨迹的系列点。由于每个轨迹的长度不同，因此行的长度也不同。也就是说，每一行的列数都不相同。

据我所知，Python中的genfromtxt()模块需要每一行的列数相同。

>>> import numpy as np
>>> 
>>> data=np.genfromtxt('deer_1995.txt', skip_header=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 1638, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #4 (got 2352 columns instead of 1824)
    Line #5 (got 2182 columns instead of 1824)
    Line #6 (got 1412 columns instead of 1824)
    Line #7 (got 1650 columns instead of 1824)
    Line #8 (got 1688 columns instead of 1824)
    Line #9 (got 1500 columns instead of 1824)
    Line #10 (got 1208 columns instead of 1824)

此外，它还能通过 filling_values 的帮助填充缺失值。但我认为这会引发不必要的麻烦，我希望避免。

那么有没有最好（Pythonic）的方法可以在不填充“缺失值”的情况下简单导入数据集？

- Sibbs Gambling

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- lucasg · Accepted Answer

Numpy.genfromtxt无法处理可变长度的行，因为numpy只能处理固定行/列大小的数组和矩阵。

您需要手动解析数据。例如：

数据（基于csv）：

0.613 ;  5.919 
0.615 ;  5.349
0.615 ;  5.413
0.617 ;  6.674
0.617 ;  6.616
0.63 ;   7.418
0.642 ;  7.809 ; 5.919
0.648 ;  8.04
0.673 ;  8.789
0.695 ;  9.45
0.712 ;  9.825
0.734 ;  10.265
0.748 ;  10.516
0.764 ;  10.782
0.775 ;  10.979
0.783 ;  11.1
0.808 ;  11.479
0.849 ;  11.951
0.899 ;  12.295
0.951 ;  12.537
0.972 ;  12.675
1.038 ;  12.937
1.098 ;  13.173
1.162 ;  13.464
1.228 ;  13.789
1.294 ;  14.126
1.363 ;  14.518
1.441 ;  14.969
1.545 ;  15.538
1.64 ;   16.071
1.765 ;  16.7
1.904 ;  17.484
2.027 ;  18.36
2.123 ;  19.235
2.149 ;  19.655
2.172 ;  20.096
2.198 ;  20.528
2.221 ;  20.945
2.265 ;  21.352
2.312 ;  21.76
2.365 ;  22.228
2.401 ;  22.836
2.477 ;  23.804

解析器：

import csv
datafile = open('i.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
    # I split the input string based on the comma separator, and cast every elements into a float
    data.append( [ float(elem) for elem in row[0].split(";") ] )

print data