在Python文件处理中，混合使用readline()和行迭代器是否安全？

Question

在Python文件处理中，混合使用readline()和行迭代器是否安全？

18

readline() 和 for line in file 一起使用是否安全，并且它们保证使用相同的文件位置？

通常，我想要忽略第一行（标题），所以我这样做：

FI = open("myfile.txt")
FI.readline()             # disregard the first line
for line in FI:
    my_process(line)
FI.close()

这样做安全吗，也就是说，在迭代行时保证使用相同的文件位置变量吗？

- highBandWidth

3个回答

4

这在长期运行中很有效。它忽略了您正在处理文件的事实，并且适用于任何序列。此外，拥有显式迭代器对象（rdr）保持不变，允许您在for循环体内跳过行而不会弄乱任何东西。

with open("myfile.txt","r") as source:
    rdr= iter(source)
    heading= next(rdr)
    for line in rdr:
        process( line )

- S.Lott

这太棒了！"拥有显式迭代器对象（rdr）挂在身边，可以让你在for循环体内跳过行而不会弄乱任何东西。" - gleb.pitsevich

2

如果机制得到控制，那么就是安全的。

=============================

.

在readline()指令之后进行迭代没有问题。

但是，在迭代之后执行readline()指令会出现问题。

我创建了一个名为'rara.txt'的文件，并包含以下文本（由于Windows下的'\r\n'行尾符，每行长度为5）。

1AA
2BB
3CC
4DD
5EE
6FF
7GG
8HH
9II
10j
11k
12l
13m
14n
15o

我执行了

FI  = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

cnt = 0
for line in FI:
    cnt += 1
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
    if cnt==4:
        break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'


lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

for line in FI:
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'

结果为：

'1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 

cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75

FI.tell() after iteration 'for line in FI' :  75 


Traceback (most recent call last):
  File "E:\Python\NNN codes\esssssai.py", line 16, in <module>
    lineR = FI.readline()
ValueError: Mixing iteration and read methods would lose data

.

有一个奇怪的现象，如果我们通过 tell() 方法来更新“光标”，在迭代后方法 readline() 可以再次被激活（我不知道“光标”更新的背后机制是什么）：

FI  = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

cnt = 0
for line in FI:
    cnt += 1
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
    if cnt==4:
        pos = FI.tell()
        break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'

FI.seek(pos)

lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

for line in FI:
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'

结果

'1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 

cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75

FI.tell() after iteration 'for line in FI' :  75 

''   len==0  FI.tell() after FI.readline() :  75
''   len==0  FI.tell() after FI.readline() :  75 


FI.tell() after iteration 'for line in FI' :  75

无论如何，我们注意到即使算法在迭代过程中只读取4行（感谢计数器cnt），光标也已经从迭代开始时就移到了文件的末尾：所有在当前位置之前的文件都已经被读取一次。

因此，在break之前的pos = FI.tell()并不是读取4行后的位置，而是文件末尾的位置。

.

如果我们想要在迭代过程中从读取4行的确切位置再次使用readline()，我们必须采取特殊措施：

FI  = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

cnt = 0
pos = FI.tell()
for line in FI:
    cnt += 1
    pos += len(line)
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
    if cnt==4:
        break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell()
print "    pos   after iteration 'for line in FI' : ",pos,'\n'

FI.seek(pos)

lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+'   len=='+str(len(lineR))+\
      '  FI.tell() after FI.readline() : ',FI.tell(),'\n'

cnt = 0
for line in FI:
    cnt += 1
    print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
          "  FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'

结果

'1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 

cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75

FI.tell() after iteration 'for line in FI' :  75
    pos   after iteration 'for line in FI' :  25 

'6FF\r\n'   len==5  FI.tell() after FI.readline() :  30
'7GG\r\n'   len==5  FI.tell() after FI.readline() :  35 

cnt==1   '8HH\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==2   '9II\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==3   '10j\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==4   '11k\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==5   '12l\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==6   '13m\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==7   '14n\r\n'   len==5  FI.tell() after 'line in FI' :  75
cnt==8   '15o\r\n'   len==5  FI.tell() after 'line in FI' :  75

FI.tell() after iteration 'for line in FI' :  75

.

所有这些操作都仅在文件以二进制模式打开的情况下才可能进行，因为我使用的是 Windows 系统，它使用 '\r\n' 作为行尾符，即使按照 'w' 模式写入了类似于 'abcdef\n' 的内容，

而 Python 在 'r' 模式下会将所有的 '\r\n' 转换为 '\n'。

这实在是一团糟，要想控制这一切，文件必须以 'rb' 模式打开，如果我们想要进行精确的操作的话。

.

你知道吗？我喜欢对文件位置进行这些游戏。

- eyquem

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Simon Whitaker · Accepted Answer

16

不，这并不安全：

由于使用了读取前瞻缓存（read-ahead buffer），将next()与其他文件方法（例如readline()）结合使用会导致无法正确工作。

您可以在此处使用next()跳过第一行。您还应该测试是否引发了StopIteration异常，如果文件为空，则会引发该异常。

with open('myfile.txt') as f:
    try:
        header = next(f)
    except StopIteration as e:
        print "File is empty"
    for line in f:
        # do stuff with line

- Simon Whitaker

1

最好使用 next 函数。 - SilentGhost

4

因为在Python 3中，.next已经被移除了。 - SilentGhost

已编辑（迟到总比不到好；-）） - Simon Whitaker

@Simon Whitaker @highBandWidth Simon，你的敷衍回答传播了一个错误的观念。首先，你的引用是不完整的：在你的引用之后，文档的文本继续说道：“（...）不起作用。然而，使用seek()将文件重新定位到绝对位置将刷新预读缓冲区。”也就是说，在一些精确的控制和理解下，可以混合使用不同的文件读取方法。其次，你显然没有进行一些测试来成熟地理解这些过程。 - eyquem

@Simon Whitaker @highBandWidth，抱歉，Simon，我认为你的绝对说法“不，这是不安全的”是不正确的。另外两个答案也表达了在特定条件下它是安全的。很遗憾看到人们更喜欢像你这样的简短回答，而不是像我这样的长篇回答，试图提高对Python的真正和合理的理解。你应该纠正你的回答的总体思路，就像你纠正了关于.next()的小问题一样。 - eyquem

1

@eyquem - 我在回答原始问题：“所以我这样做：<代码片段> - 安全吗？”。对于那个代码片段，答案是否定的。你说得对，你可以使用seek()来重新定位文件指针，但你却忽略了并非所有文件对象都是可寻址的。（例如，你无法在STDIN上进行寻址）。我很高兴在这个问题上达成不同意见的共识。 :) - Simon Whitaker