使用 Pandas 中的 read_csv 函数读取最后几行数据

Question

使用 Pandas 中的 read_csv 函数读取最后几行数据

3

我有一个不断增长的文件，内容如下：

https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|158|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|246|POST|74.125.200.95
https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|140|POST|203.101.110.171
https|webmail.mahindracomviva.com|application/x-protobuf|52|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|502|POST|74.125.200.95
https|www.googleapis.com|application/x-protobuf|40|POST|74.125.200.95

但我想使用Pandas仅阅读最后50行。

- itsaruns

这个问题/答案中的任何内容是否有帮助？链接 - summea

4

你使用的操作系统是什么？在*nix系统中，你可以使用tail -n 50 long_file.csv > short_file.csv先创建一个文件，然后使用它。 - lev

请改进问题。如何读取一个不断增长的文件的“最后50行”？最后一行还没有到达。 - krethika

2个回答

-1

尝试使用pandas的tail()函数，代码如下：

filename = "your_file"
last_rows = 3
data = pd.read_csv(filename, header=None, sep = "|")
print(data.tail(last_rows))

- 0x4ndy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Adil · Accepted Answer

你需要遵循以下步骤：

First find the length of CSV file without loading the whole CSV files into the ram. You have to use chunksize in read_csv().

import pandas as pd
count = 0
for data in pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',chunksize  = 1000):
    count += 1                          # counting the number of chunks
    lastlen = len(data)                 # finding the length of last chunk
datalength = (count*1000 + lastlen - 1000) # length of total file

Second minus the no of rows which you want to read.

rowsdiff = datalen - 300
df = pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',skiprows = range(1,difrows), nrows = 299)

通过这种方法，您只需要读取最后几行，而无需将整个CSV文件加载到内存中。