Python中与R的read.table函数相对应的函数是什么？

Question

Python中与R的read.table函数相对应的函数是什么？

5

我想把一些处理工作从R转移到Python。在R中，我使用read.table()函数来读取非常混乱的CSV文件，并且它会自动将记录以正确的格式分割。例如：

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"

正确地分成了4列。1条记录可以分成多行，并且逗号随处可见。在R中，我只需要这样做：

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

有没有Python中的工具能够同样做到这个功能呢？谢谢！

- mchangun

参见：https://dev59.com/aGEh5IYBdhLWcg3wRRqh - PatrickT

2个回答

3

pandas模块也提供了许多类似于R的函数和数据结构，包括read_csv。这里的优点是数据将被读入作为一个pandas DataFrame，它比标准python列表或字典更容易操作（特别是如果你已经习惯了R）。以下是一个例子：

>>> from pandas import read_csv
>>> ugly = read_csv("ugly.csv",header=None)
>>> ugly
        0                                              1  \
0  391788  HP Deskjet 3050 scanner always seems to break   

                                                   2                     3  
0  <p>I'm running a Windows 7 64 blah blah blah.....  windows-7 printer hp

- David

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- VIKASH JAISWAL · Accepted Answer

你可以使用csv模块。

from csv import reader
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"")

for row in csv_reader:
    print row

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

输出长度为4