有没有一种方法可以关闭PdfFileReader打开的文件？

Question

有没有一种方法可以关闭PdfFileReader打开的文件？

8

我正在打开许多PDF文件，并希望在解析完后删除这些文件，但是这些文件只有在程序运行完毕后才会关闭。我该如何使用PyPDF2关闭我打开的PDF文件？

代码：

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = PyPDF2.PdfFileReader(file(path, "rb"))

    #Check for number of pages, prevents out of bounds errors
    max = 0
    if pdf.numPages > 3:
        max = 3
    else:
        max = (pdf.numPages - 1)

    # Iterate pages
    for i in range(0, max): 
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    #pdf.close()
    return content

- SPYBUG96

3个回答

2

当进行以下操作时：

pdf = PyPDF2.PdfFileReader(file(path, "rb"))

您正在传递一个句柄的引用，但是您无法控制文件何时关闭。

您应该使用句柄创建上下文，而不是从这里匿名传递它：

我会这样写：

with open(path,"rb") as f:

    pdf = PyPDF2.PdfFileReader(f)
    #Check for number of pages, prevents out of bounds errors
    ... do your processing
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
# now the file is closed by exiting the block, you can delete it
os.remove(path)
# and return the contents
return content

- Jean-François Fabre

2

是的，你正在向PdfFileReader传递流，并且你可以关闭它。使用with语法可以更好地为您完成这项工作：

def getPDFContent(path):
    with open(path, "rb") as f:
        content = ""
        # Load PDF into pyPDF
        pdf = PyPDF2.PdfFileReader(f)

        #Check for number of pages, prevents out of bounds errors
        max = 0
        if pdf.numPages > 3:
            max = 3
        else:
            max = (pdf.numPages - 1)

        # Iterate pages
        for i in range(0, max): 
            # Extract text from page and add to content
            content += pdf.getPage(i).extractText() + "\n"
        # Collapse whitespace
        content = " ".join(content.replace(u"\xa0", " ").strip().split())
        return content

- de1

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Him · Accepted Answer

只需自己打开并关闭文件

f = open(path, "rb")
pdf = PyPDF2.PdfFileReader(f)
f.close()

PyPDF2 .read() 在构造函数中直接读取传入的流。因此，在初始对象构造之后，您可以丢弃该文件。

上下文管理器也可以使用：

with open(path, "rb") as f:
    pdf = PyPDF2.PdfFileReader(f)
do_other_stuff_with_pdf(pdf)