使用Python中的readline()函数读取特定行

Question

使用Python中的readline()函数读取特定行

4

在使用Python中的readline()时，是否可以指定要读取的行？当我运行以下代码时，会读取1、2、3行，但我想读取第2、6、10行。

def print_a_line(line, f):
    print f.readline(line)

current_file = open("file.txt")

for i in range(1, 12):
    if(i%4==2):
        print_a_line(i, current_file)

- The Nightman

如果您知道每行的确切字节长度，可以使用current_file.seek。但在大多数情况下，您不知道，需要扫描文件以查找行的结束位置（寻找\n字符）。正如ZdaR所指出的那样，无论如何最好读取完整个文件（这相当于扫描它）。 - user707650

1

如果你有一个以\n结尾的10GB（文本？）文件，并且需要在任意位置访问行，那么你可能需要重新考虑你的存储模型。 - user707650

4个回答

3

您可以使用itertools中的consume函数，它是跳过多行的最快方法之一。这里有详细介绍。

from itertools import islice
from collections import deque

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

with open("in.txt") as f:
    l = []
    sm = 0
    for i in (2, 6, 10):
        i -= sm
        consume(f, i-1)
        l.append(next(f, ""))
        sm += i

我们只需要减去已经消耗的部分，以便保持每个i匹配的行数。您可以将代码放入一个函数中，并产生每一行：

def get_lines(fle,*args):
    with open(fle) as f:
        l, consumed = [], 0
        for i in args:
            i -= consumed
            consume(f, i-1)
            yield next(f, "")
            consumed += i

只需传递文件名和行号即可使用：

test.txt:

输出：

In [4]: list(get_lines("test.txt",2, 6, 10))
Out[4]: ['2\n', '6\n', '10\n']
In [5]: list(get_lines("stderr.txt",3, 5, 12))
Out[5]: ['3\n', '5\n', '12']

如果你只需要一行，你也可以使用linecache：

import linecache

linecache.getline("test.txt",10)

- Padraic Cunningham

2

with open('file.txt', 'r') as f:
    next(f)
    for line in f:
        print(line.rstrip('\n'))
        for skip in range(3):
            try:
                next(f)
            except StopIteration:
                break

文件:

结果：

2
6
10

这对于脚本或函数来说是可行的，但如果你想在交互式shell中隐藏跳过的行，你需要将 next(f) 调用保存到一个临时变量中。

- TigerhawkT3

使用with open('file.txt', 'r') as f:代码时，我遇到了一个缩进错误。 - The Nightman

如果多个独立的解决方案给出了“IndentationError”，那么在我们的解决方案之前的代码存在问题，您需要修复缩进。 - TigerhawkT3

所以即使我创建一个空文件，只放入那第一行代码，我仍然会得到相同的错误。 - The Nightman

没有办法，一个简单的with结构的第一行会产生IndentationError。如果你看到这样的情况，那么你的环境可能出了问题。 - TigerhawkT3

实际上我明白了，需要在第一行之后加上 next(f) 才不会出错。 - The Nightman

如果Python期望一个缩进块，你应该提供一个。这是相当基本的：这就是为什么存在pass的原因。 - TigerhawkT3

1

读取文件总是从第一个字符开始的。阅读器不知道内容，因此它不知道哪些是行的开头和结尾。readline只会读取直到遇到换行符为止。实际上，这适用于任何语言，而不仅仅是Python。如果你想获取第n行，可以跳过前n-1行：

def my_readline(file_path, n):
    with open(file_path, "r") as file_handle:
        for _ in range(1, n):
            file_handle.readline()
        return file_handle.readline()

请注意，使用此解决方案需要在每个函数调用时打开文件，这可能会严重降低程序的性能。

- hajtos

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- davidism · Accepted Answer

不，你不能这样使用readline。相反，跳过你不想要的行。必须读取整个文件，因为你无法事先知道要查找哪一行（除非换行符以某种规律出现）。你可以使用enumerate确定你所在的行，这样你只需要读取一次文件，可以在不关心的位置停止。

with open('my_file') as f:
    for i, line in enumerate(f, start=1):
        if i > 12:
            break
        if i % 4 == 0:
            print(i, line)

如果您知道每行的字节数，您可以针对给定行跳转到特定位置，而不是迭代每行。

line_len = 20  # bytes

with open('my_file', 'rb') as f:
    for i in range(0, 13, 4):
        f.seek(i * line_len)
        print(f.read(line_len).decode())