如何用Python填写PDF表单?

17

我该如何填写带有表单的PDF文件并将其“压平”?

目前我使用pdftk,但它无法正确处理国际字符。

是否有任何Python库或示例可用于填写PDF表单并将其呈现为不可编辑的PDF文件?


我不完全理解你的意思,但是ReportLab是一个广泛使用的Python库,用于生成PDF文件。 - Antonis Christofides
请参考以下回答:https://dev59.com/-HI-5IYBdhLWcg3wYXH8 - Aakash Anuj
@AntonisChristofides ,OP正在寻找一种简单的方法来“展平”(即合并)表格字段和数据库内容。 我猜他应该有要打印的内容,这些内容在Python字典中,并且已经制作好了PDF表格。 无论如何,对我来说,像你建议的那样使用 ReportLab 是正确的选择。 - Mathieu Marques
4个回答

19

以下内容摘自pypdf文档(添加在问题提出多年之后):

from pypdf import PdfReader, PdfWriter

reader = PdfReader("form.pdf")
writer = PdfWriter()

page = reader.pages[0]
fields = reader.get_fields()

writer.add_page(page)

writer.update_page_form_field_values(
    writer.pages[0], {"fieldname": "some filled in text"}
)

# write "output" to PyPDF2-output.pdf
with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)

相关问题:https://dev59.com/Dabja4cB1Zd3GeqPlMF0 - Wtower
这个解决方案对我非常有效。PyPDF2最近已经更新,似乎是一个活跃的项目。 - Trevor Sullivan
3
我是PyPDF2和pypdf的维护者。我将PyPDF2移回到了pypdf中。未来只有pypdf会接收新的功能和错误修复。 - Martin Thoma
@MartinThoma 我意识到在填写完毕后,我需要打开并保存PDF文件,以便真正保存表单,否则它将保持未保存状态,并且当我再次打开或复制时,更改是不可见的。这是一个错误吗?有没有解决办法? - weasel
听起来像是个bug,但我也不太明白。你使用的是最新版本的pypdf吗?如果是的话,请在bug跟踪器中报告一个bug。如果不是的话,请升级。上周我们对表单进行了一些改进。 - Martin Thoma

11

尝试使用fillpdf库,它可以使这个过程变得非常简单(pip install fillpdf和poppler依赖项conda install -c conda-forge poppler

基本用法:

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

更多信息请查看: https://github.com/t-houssian/fillpdf

似乎填充得非常好。

有关更多信息,请参见此答案:https://dev59.com/La7la4cB1Zd3GeqPfJLw#66809578


1
字典没有返回任何内容。我用这段代码无法获取PDF的字段。 - Celik
使用pdfrw2pymupdf。我想知道它们中的任何一个是否可以直接完成此操作。 - Martin Thoma
2
这个库对我来说完美地工作了! - Rycliff
1
这个也对我有用,使用 PyPDF2 的解决方案没有起作用。 - Timur Mingulov
1
这个对我也起作用了,使用PyPDF2的解决方案没有起作用。 - undefined

1

根据Adobe文档,您不需要专门的库来压平PDF,您可以将可编辑表单字段的位位置更改为1以使该字段只读。我在这里提供了完整的解决方案,但它使用Django:

https://dev59.com/5bLma4cB1Zd3GeqPco_J#55301804

Adobe文档(第441页):

https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf

使用PyPDF2填充字段,然后循环遍历注释以更改位位置:
from io import BytesIO
import PyPDF2
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject

# open the pdf
input_stream = open("YourPDF.pdf", "rb")
pdf_reader = PyPDF2.PdfFileReader(input_stream, strict=False)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
    pdf_reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

pdf_writer = PyPDF2.PdfFileWriter()
set_need_appearances_writer(pdf_writer)
if "/AcroForm" in pdf_writer._root_object:
    # Acro form is form field, set needs appearances to fix printing issues
    pdf_writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

data_dict = dict() # this is a dict of your DB form values

pdf_writer.addPage(pdf_reader.getPage(0))
page = pdf_writer.getPage(0)
# update form fields
pdf_writer.updatePageFormFieldValues(page, data_dict)
for j in range(0, len(page['/Annots'])):
    writer_annot = page['/Annots'][j].getObject()
    for field in data_dict:
        if writer_annot.get('/T') == field:
            writer_annot.update({
                NameObject("/Ff"): NumberObject(1)    # make ReadOnly
            })
output_stream = BytesIO()
pdf_writer.write(output_stream)

# output_stream is your flattened PDF


def set_need_appearances_writer(writer):
    # basically used to ensured there are not 
    # overlapping form fields, which makes printing hard
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
      

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
    
    return writer  

PDF标准文档已经移动。新位置:https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf - bcattle
@bcattle,感谢您的更新,看起来页面已经更改,所以我现在引用了正确的页面。 - ViaTech

-1

我们还可以考虑使用 API 而非导入包来处理 PDF。这种方式有其优点和缺点,但是它给了我们提升应用程序的新视角!

其中一个例子是使用 PDF.co API 来填充 PDF 表单。你还可以考虑其他选择,如 Adobe API、DocSpring、pdfFiller 等。以下代码片段可能会很有用,它演示了如何使用预定义的 JSON 负载填充 PDF 表单。

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co/documentation/api
API_KEY = "**************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"


def main(args = None):
    fillPDFForm()


def fillPDFForm():
    """Fill PDF form using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://apidocs.pdf.co
    payload = "{\n    \"async\": false,\n    \"encrypt\": false,\n    \"name\": \"f1040-filled\",\n    \"url\": \"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-form/f1040.pdf\",\n    \"fields\": [\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].FilingStatus[0].c1_01[1]\",\n            \"pages\": \"1\",\n            \"text\": \"True\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_02[0]\",\n            \"pages\": \"1\",\n            \"text\": \"John A.\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_03[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_04[0]\",\n            \"pages\": \"1\",\n            \"text\": \"123456789\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_06[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_07[0]\",\n            \"pages\": \"1\",\n            \"text\": \"987654321\"\n        }     \n\n\n\n    ],\n    \"annotations\":[\n        {\n            \"text\":\"Sample Filled with PDF.co API using /pdf/edit/add. Get fields from forms using /pdf/info/fields\",\n            \"x\": 10,\n            \"y\": 10,\n            \"size\": 12,\n            \"pages\": \"0-\",\n            \"color\": \"FFCCCC\",\n            \"link\": \"https://pdf.co\"\n        }\n    ],    \n    \"images\": [        \n    ]\n}"

    # Prepare URL for 'Fill PDF' API request
    url = "{}/pdf/edit/add".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=payload, headers={"x-api-key": API_KEY, 'Content-Type': 'application/json'})
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

if __name__ == '__main__':
    main()


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接