我正在尝试通过Python中Pandas的read_csv函数读取一个文本文件。我的文本文件长这样(所有数值都是数字):
35 61 7 1 0 # with leading white spaces
0 1 1 1 1 1 # with leading white spaces
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
我的Python代码如下:
import pandas as pd
df = pd.read_csv('example.txt', header=None)
df
输出结果如下:
CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
在处理前导空格之前,我需要先处理一个“Error tokenizing data.”问题。因此,我更改了代码如下:
import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df
我可以按照我的意愿获取带有前导空格的数据,但第五行的数据已经消失了。输出结果如下:
b'Skipping line 5: expected 1 fields, saw 3\n
35 61 7 1 0 # with leading white spaces as intended
0 1 1 1 1 1 # with leading white spaces as intended
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
# 5th line disappeared (not my intention).
所以我尝试修改以下代码,以获取第五行。
import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df
我在第5行成功获取了数据,但是第1行和第2行的前导空格已经消失,具体如下:
35 61 7 1 0 # without leading white spaces(not my intention)
0 1 1 1 1 1 # without leading white spaces(not my intention)
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.
我看到了一些有关保留字符串前导空格的帖子,但是我找不到保留数字前导空格的情况。感谢您的帮助。
df.dtypes
- 也许你正在将该列转换为整数,这当然没有空格的概念。 - John Zwinckdtype=object
,更好地展示您的代码。 - Bharath M Shetty