Python - 如何打开文件并指定字节偏移量？

Question

Python - 如何打开文件并指定字节偏移量？

16

我正在编写一个程序，将定期解析Apache日志文件以记录其访问者、带宽使用情况等。

问题在于，我不想打开已经解析过的日志并重新解析数据。例如：

line1
line2
line3

如果我解析那个文件，我会保存所有行，然后保存该偏移量。这样，当我再次解析它时，我就可以得到：

line1
line2
line3 - The log will open from this point
line4
line5

第二次循环，我将获取line4和line5。希望这有意义...

我需要知道的是，我该如何完成这个操作？Python有seek()函数来指定偏移量...那么我只需要在解析完日志后获取其文件大小（以字节为单位），然后将其用作第二次记录时的偏移量（在seek()函数中）吗？

我似乎想不出编写代码的方法。 >.<

- dave

8个回答

4

log = open('myfile.log')
pos = open('pos.dat','w')
print log.readline()
pos.write(str(f.tell())
log.close()
pos.close()

log = open('myfile.log')
pos = open('pos.dat')
log.seek(int(pos.readline()))
print log.readline()

当然，你不应该像这样使用它 - 你应该将操作封装在函数中，例如save_position(myfile)和load_position(myfile)，但是所有的功能都在那里。

- Wayne Werner

1

如果您的日志文件可以轻松地放入内存中（也就是说，您有合理的轮换策略），那么您可以轻松地执行以下操作：

log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)

如果您无法做到这一点，可以使用类似以下方式（请参见已接受答案中关于使用seek和tell的内容，以防需要使用它们）获取Python文件的最后n行，类似于tail

- Vinko Vrsalovic

日志是针对虚拟主机的，因此目前没有日志轮换。我想我应该考虑设置它...这将使您的解决方案非常有用。干杯。 - dave

0

简单但不推荐：）

last_line_processed = get_last_line_processed()    
with open('file.log') as log
    for record_number, record in enumerate(log):
        if record_number >= last_line_processed:
            parse_log(record)

- systempuntoout

0

这里是代码，证明了使用您提供的长度建议和tell方法：

beginning="""line1
line2
line3"""

end="""- The log will open from this point
line4
line5"""

openfile= open('log.txt','w')
openfile.write(beginning)
endstarts=openfile.tell()
openfile.close()

open('log.txt','a').write(end)
print open('log.txt').read()

print("\nAgain:")
end2 = open('log.txt','r')
end2.seek(len(beginning))

print end2.read()  ## wrong by two too little because of magic newlines in Windows
end2.seek(endstarts)

print "\nOk in Windows also"
print end2.read()
end2.close()

- Tony Veijalainen

0

这里有一个高效且安全的代码片段，可以保存并行文件中读取的偏移量。它基本上是 Python 版本的 logtail。

with open(filename) as log_fd:
    offset_filename = os.path.join(OFFSET_ROOT_DIR,filename)
    if not os.path.exists(offset_filename):
        os.makedirs(os.path.dirname(offset_filename))
        with open(offset_filename, 'w') as offset_fd:
            offset_fd.write(str(0))
    with open(offset_filename, 'r+') as offset_fd:
        log_fd.seek(int(offset_fd.readline()) or 0)
        new_logrows_handler(log_fd.readlines())
        offset_fd.seek(0)
        offset_fd.write(str(log_fd.tell()))

- Peter Lundberg

0

如果您按行解析日志，可以仅保存上次解析的行号。下次读取时，只需从正确的行开始即可。

当您需要在文件中非常特定的位置时，查找功能更加有用。

- Guillaume Lebourgeois

-1

请注意，您可以在Python中从文件末尾使用seek()函数：

f.seek(-3, os.SEEK_END)

将读取位置放在EOF的前3行。

不过，为什么不使用diff呢？可以从shell或者使用difflib。

- user106514

7

实际上，这会将读取位置放在距离EOF 3个字符处，而不是3行。 - Duncan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- luc · Accepted Answer

通过文件类的seek和tell方法，您可以管理文件中的位置。参见https://docs.python.org/2/tutorial/inputoutput.html

tell方法将告诉您在下次打开文件时应该查询的位置。