使用`sep`和`comment`参数读取CSV文件时出现Pandas错误

Question

使用`sep`和`comment`参数读取CSV文件时出现Pandas错误

pythoncsvpandas

5

情况

我需要从一个CSV文件创建一个pandas数据框，该文件具有以下特点：

文件使用的分隔符可以是逗号或空格，并且我事先不知道文件将使用哪种分隔符。
文件顶部可以有一个或多个注释行，以#开头。

问题

我尝试使用 pd.read_csv 方法，参数为 sep=None 和 comment='#' 来解决这个问题。据我了解，sep=None 参数告诉pandas自动检测分隔符字符，comment='#' 参数告诉pandas所有以#开头的行都是注释行，应该被忽略。

这些参数单独使用时可以正常工作。然而，当我将它们同时使用时，就会收到错误信息：TypeError: expected string or bytes-like object。以下代码示例演示了这一点：

from io import StringIO
import pandas as pd

# Simulated data file contents
tabular_data = (
    '# Data generated on 04 May 2017\n'
    'col1,col2,col3\n'
    '5.9,7.8,3.2\n'
    '7.1,0.4,8.1\n'
    '9.4,5.4,1.9\n'
)

# This works
df1 = pd.read_csv(StringIO(tabular_data), sep=None)
print(df1)

# This also works
df2 = pd.read_csv(StringIO(tabular_data), comment='#')
print(df2)

# This will give an error
df3 = pd.read_csv(StringIO(tabular_data), sep=None, comment='#')
print(df3)

很遗憾，我不太明白是什么导致了这个错误。有没有人能在这里给我一些帮助来解决这个问题？

- Xukrao

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MaxU - stand with Ukraine · Accepted Answer

试试这个：

In [186]: df = pd.read_csv(StringIO(tabular_data), sep=r'(?:,|\s+)',
                           comment='#', engine='python')

In [187]: df
Out[187]:
   col1  col2  col3
0   5.9   7.8   3.2
1   7.1   0.4   8.1
2   9.4   5.4   1.9

'(?:,|\s+)' 是一个用于选择逗号或任意数量连续的空格/制表符的正则表达式。