如何使用pyPdf反转PDF文件中页面的顺序?

3
我有一个名为"myFile.pdf"的pdf文件。我想使用pyPdf来反转它的页面顺序。怎么做?

@Tom:到目前为止,我尝试的方法与nosklo的答案有些相似。我们都遇到了同样的错误(在关闭文件后进行I/O操作)。原因当然是我们在关闭后仍使用了“output_pdf”。我刚刚修复了他的解决方案。 - snakile
4个回答

6
from pyPdf import PdfFileWriter, PdfFileReader
output_pdf = PdfFileWriter()

with open(r'input.pdf', 'rb') as readfile:
    input_pdf = PdfFileReader(readfile)
    total_pages = input_pdf.getNumPages()
    for page in xrange(total_pages - 1, -1, -1):
        output_pdf.addPage(input_pdf.getPage(page))
    with open(r'output.pdf', "wb") as writefile:
        output_pdf.write(writefile)

谢谢您,这篇文章帮助我找到了导致“在关闭的文件上进行I/O操作”的错误:您必须保持PdfFileReader打开状态,直到PdfFileWriter完成写入。 (对我来说有点违反直觉)。 可能是因为编写器实际上不会从读取器的页面中寻找数据,直到需要提高性能为止。 - user85461
这显然可以工作,但使用 range(total_pages, 0, -1) 是否更有意义呢? 这样你最终甚至可以在迭代的任何时候通过页码访问页面,而不会引起混淆。 - Sammeeey

4

感谢您分享建议。我使用了它们并进行了一些编辑,使选择和保存文件时的界面更具图形化。对于这一切我都是新手,我的改动可能不够有效或干净,但它对我很有用,所以我想分享。

from PyPDF2 import PdfFileWriter, PdfFileReader
import tkinter as tk
from tkinter import filedialog
import ntpath
import os


output_pdf = PdfFileWriter()

# grab the location of the file path sent
def path_leaf(path):
    head, tail = ntpath.split(path)
    return head

# graphical file selection
def grab_file_path():
    # use dialog to select file
    file_dialog_window = tk.Tk()
    file_dialog_window.withdraw()  # hides the tk.TK() window
    # use dialog to select file
    grabbed_file_path = filedialog.askopenfilename()
    return grabbed_file_path


# file to be reversed
filePath = grab_file_path()

# open file and read
with open(filePath, 'rb') as readfile:
    input_pdf = PdfFileReader(readfile)

    # reverse order one page at time
    for page in reversed(input_pdf.pages):
        output_pdf.addPage(page)

    # graphical way to get where to select file starting at input file location
    dirOfFileToBeSaved = path_leaf(filePath)
    locationOfFileToBeSaved=filedialog.asksaveasfilename(initialdir=dirOfFileToBeSaved, initialfile='name of reversed file.pdf',title="Select or type file name and location", filetypes=[("pdf files", "*.pdf")])
    # write the file created
    with open(locationOfFileToBeSaved, "wb") as writefile:
        output_pdf.write(writefile)

# open the file when done
os.startfile(locationOfFileToBeSaved)

3
截至2019年1月(很久以前),pyPdf已不再更新,并且经过测试,与Python 3.6(至少)不兼容,很可能与Python 3完全不兼容:
In [1]: import pyPdf
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-bba5a42e9137> in <module>
----> 1 import pyPdf

c:\temp\envminecart\lib\site-packages\pyPdf\__init__.py in <module>
----> 1 from pdf import PdfFileReader, PdfFileWriter
      2 __all__ = ["pdf"]

ModuleNotFoundError: No module named 'pdf'

(Moving the __all__ assignment above the import fixes this specific problem, but other SyntaxErrors due to Python 2 syntax then pop up.)
幸运的是,它的后继项目 PyPDF2 在 Python 3.6 上工作得很好(至少如此)。似乎核心用户界面API有意保持与pyPdf兼容,因此nosklo's answer 可以在现代Python中使用,只需通过将import语句更改为PyPDF2并将xrange更改为range,然后进行pip install PyPDF2即可。
from PyPDF2 import PdfFileWriter, PdfFileReader
output_pdf = PdfFileWriter()

with open(r'input.pdf', 'rb') as readfile:
    input_pdf = PdfFileReader(readfile)
    total_pages = input_pdf.getNumPages()
    for page in range(total_pages - 1, -1, -1):
        output_pdf.addPage(input_pdf.getPage(page))
    with open(r'output.pdf', "wb") as writefile:
        output_pdf.write(writefile)

我建议采用更符合Python风格的方法,直接使用 reversed 迭代页面:
from PyPDF2 import PdfFileWriter, PdfFileReader

output_pdf = PdfFileWriter()

with open('input.pdf', 'rb') as readfile:
    input_pdf = PdfFileReader(readfile)

    for page in reversed(input_pdf.pages):
        output_pdf.addPage(page)

    with open('output.pdf', "wb") as writefile:
        output_pdf.write(writefile)

我不知道这个.pages集合在原始的pyPdf中是否可用,但可以争论的是,在这一点上它并不真正重要。


0
2023年的工作内容:
from pypdf import PdfWriter, PdfReader
output_pdf = PdfWriter()

with open(r'input.pdf', 'rb') as readfile:
    input_pdf = PdfReader(readfile)
    total_pages = len(input_pdf.pages)
    for page in range(total_pages - 1, -1, -1):
        output_pdf.add_page(input_pdf.pages[page])
    with open(r'output.pdf', "wb") as writefile:
        output_pdf.write(writefile)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接