以下代码创建了一个名为
Which_Line_for_Position(pos)的函数,该函数给出了位置
pos所在的
行号,即该文件中位于位置
pos处的字符所在的
行号。
该函数可以用于任何位置作为参数,独立于文件指针当前位置的值和调用函数之前指针移动的历史记录。
因此,有了这个函数,不仅仅限于在对行进行不间断迭代时确定当前行的编号,这是Greg Hewgill解决方案的情况。
with open(filepath,'rb') as f:
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
.
可以使用模块fileinput编写相同的解决方案:
import fileinput
GIVE_NO_FOR_END = {}
end = 0
for line in fileinput.input(filepath,'rb'):
end += len(line)
GIVE_NO_FOR_END[end] = fileinput.filelineno()
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1
fileinput.close()
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
但是这种解决方案有一些不便之处:
- 需要导入模块fileinput
- 它会删除文件的所有内容!!我的代码可能有问题,但我不太了解fileinput,无法找到问题。或者说,fileinput.input()函数的正常行为就是这样吗?
- 似乎在启动任何迭代之前,文件首先被完全读取。如果是这样,对于非常大的文件,文件大小可能超过RAM的容量。我不确定这一点:我尝试使用1.5 GB的文件进行测试,但这需要很长时间,所以我暂时放弃了这一点。如果这一点正确,那么使用具有enumerate()的其他解决方案将成为一个论据。
.
例子:
text = '''Harold Acton (1904–1994)
Gilbert Adair (born 1944)
Helen Adam (1909–1993)
Arthur Henry Adams (1872–1936)
Robert Adamson (1852–1902)
Fleur Adcock (born 1934)
Joseph Addison (1672–1719)
Mark Akenside (1721–1770)
James Alexander Allan (1889–1956)
Leslie Holdsworthy Allen (1879–1964)
William Allingham (1824/28-1889)
Kingsley Amis (1922–1995)
Ethel Anderson (1883–1958)
Bruce Andrews (born 1948)
Maya Angelou (born 1928)
Rae Armantrout (born 1947)
Simon Armitage (born 1963)
Matthew Arnold (1822–1888)
John Ashbery (born 1927)
Thomas Ashe (1836–1889)
Thea Astley (1925–2004)
Edwin Atherstone (1788–1872)'''
f = text.splitlines(True)
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end))
for end in end_positions)
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
print
for x in (2,450,320,104,105,599,600):
print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x))
结果
line 0 ending at position 25
line 1 ending at position 51
line 2 ending at position 74
line 3 ending at position 105
line 4 ending at position 132
line 5 ending at position 157
line 6 ending at position 184
line 7 ending at position 210
line 8 ending at position 244
line 9 ending at position 281
line 10 ending at position 314
line 11 ending at position 340
line 12 ending at position 367
line 13 ending at position 393
line 14 ending at position 418
line 15 ending at position 445
line 16 ending at position 472
line 17 ending at position 499
line 18 ending at position 524
line 19 ending at position 548
line 20 ending at position 572
line 21 ending at position 600
pos=2 line 0
pos=450 line 16
pos=320 line 11
pos=104 line 3
pos=105 line 4
pos=599 line 21
pos=600 line None
.
然后,有了函数Which_Line_for_Position(),就很容易获得当前行的编号:只需将f.tell()作为参数传递给该函数。
但是警告:当使用f.tell()并在文件中移动文件指针时,绝对必须以二进制模式打开文件:'rb'或'rb+'或'ab'或....
open
调用。您可能还想为使用的任何其他函数(例如close
)提供包装器,但它们应该是相当次要的传递函数。 - paxdiablofileinput
内置模块似乎可以无缝工作:fp = fileinput.input("myfile.txt"); fp.readline(); fp.lineno()
。 - Mike T__iter__ = lambda self: iter(self.f)
- saeedgnufor line in f:
- saeedgnu