在Django网站中将HTML转换为PDF

132

对于我的Django网站,我正在寻找一种简单的解决方案来将动态HTML页面转换为PDF。

这些页面包含HTML和来自Google可视化API的图表(该API基于JavaScript,但包括这些图表是必须的)。


Django文档内容深入且覆盖面广。您在使用其中建议的方法时遇到了任何问题吗?[http://docs.djangoproject.com/en/dev/howto/outputting-pdf/] - monkut
1
这实际上并没有回答问题。该文档是关于如何本地呈现PDF文件,而不是从已呈现的HTML文件呈现。 - Josh
我猜做正确的事情是让浏览器生成PDF,因为它们是唯一能够进行正确的HTML/CSS/JS渲染的工具。请参考这个问题:https://dev59.com/A18e5IYBdhLWcg3wnbVj - David Hofmann
这个问题在SO上不属于话题,但在softwarerecs.SE上是话题。请参阅如何将带有CSS的HTML转换为PDF? - Martin Thoma
尝试使用wkhtmltopdf https://learnbatta.com/blog/django-html-to-pdf-using-pdfkit-and-wkhtmltopdf-5/ - anjaneyulubatta505
你可以按照这个链接进行操作:https://reza-ta.medium.com/export-html-pages-to-pdf-in-django-applications-dc5cf9af946c - Reza Torkaman Ahmadi
10个回答

225

尝试使用Reportlab提供的解决方案。

下载并使用python setup.py install安装即可。

您还需要使用easy_install安装以下模块:xhtml2pdf、html5lib 和 pypdf。

这是一个使用示例:

首先定义此函数:

import cStringIO as StringIO
from xhtml2pdf import pisa
from django.template.loader import get_template
from django.template import Context
from django.http import HttpResponse
from cgi import escape


def render_to_pdf(template_src, context_dict):
    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)
    result = StringIO.StringIO()

    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)
    if not pdf.err:
        return HttpResponse(result.getvalue(), content_type='application/pdf')
    return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))

那么你可以像这样使用它:

def myview(request):
    #Retrieve data or whatever you need
    return render_to_pdf(
            'mytemplate.html',
            {
                'pagesize':'A4',
                'mylist': results,
            }
        )

模板:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
        <title>My Title</title>
        <style type="text/css">
            @page {
                size: {{ pagesize }};
                margin: 1cm;
                @frame footer {
                    -pdf-frame-content: footerContent;
                    bottom: 0cm;
                    margin-left: 9cm;
                    margin-right: 9cm;
                    height: 1cm;
                }
            }
        </style>
    </head>
    <body>
        <div>
            {% for item in mylist %}
                RENDER MY CONTENT
            {% endfor %}
        </div>
        <div id="footerContent">
            {%block page_foot%}
                Page <pdf:pagenumber>
            {%endblock%}
        </div>
    </body>
</html>

9
+1 我已经使用这个解决方案一年了,非常棒。PISA甚至可以用一个简单的标签创建条形码,还有更多功能。而且它非常容易使用。 - arcanum
1
由于这个带有预编译的64位Python常用库的伟大网站,我能够让事情运行起来:http://www.lfd.uci.edu/~gohlke/pythonlibs/。 - Andriy Drozdyuk
5
似乎无法运行 JavaScript。 - dfrankow
3
Pisa现在被分发为XHTML2PDF。 - fixmycode
20
在Python3中,除了将cStringIO.StringIO转换为io.StringIO之外,我们必须将result定义为result = io.BytesIO()而不是result = StringIO - Sebastien
显示剩余16条评论

15

1
django-wkhtmltopdf 对我非常有帮助!同时请确保关闭所有 JavaScript/图表引擎的动画效果。 - mehmet
@sam 你有看博客文章吗? - jithin
@jithin 谢谢你的建议。但是图表加载了但不完整。我已经设置了 'javascript-delay': 1000。有什么更好的方法吗? - Manish Ojha
@jithin 是的,我尝试了将'javascript-delay': 2000放入代码中,但是它抛出了错误returned non-zero exit status -11 - Manish Ojha
@ManishOjha,没有更多信息我无法发表评论。生成的HTML文件可以在appdata或temp文件夹中找到。请在浏览器中打开文件以检查错误。还可以尝试添加忽略错误标志来运行。 - jithin
显示剩余8条评论

12

https://github.com/nigma/django-easy-pdf

模板:

{% extends "easy_pdf/base.html" %}

{% block content %}
    <div id="content">
        <h1>Hi there!</h1>
    </div>
{% endblock %}

查看:

from easy_pdf.views import PDFTemplateView

class HelloPDFView(PDFTemplateView):
    template_name = "hello.html"

如果你想在Python 3上使用django-easy-pdf,请查看这里提出的解决方案。


2
这是我迄今尝试过的选项中最容易实现的一个。对于我的需求(从HTML版本生成PDF报告),这只是简单地起作用。谢谢! - The NetYeti
1
@alejoss 你应该使用内联样式而不是CSS。 - digz6666
这个解决方案可能无法直接适用于Django 3.0,因为django-utils-six已被删除,但easy_pdf依赖于它。 - David

11

我刚刚为CBV打了这个东西。未在生产中使用,但可以为我生成PDF。可能需要改进错误报告方面的工作,但到目前为止已经做到了。

import StringIO
from cgi import escape
from xhtml2pdf import pisa
from django.http import HttpResponse
from django.template.response import TemplateResponse
from django.views.generic import TemplateView

class PDFTemplateResponse(TemplateResponse):

    def generate_pdf(self, retval):

        html = self.content

        result = StringIO.StringIO()
        rendering = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)

        if rendering.err:
            return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))
        else:
            self.content = result.getvalue()

    def __init__(self, *args, **kwargs):
        super(PDFTemplateResponse, self).__init__(*args, mimetype='application/pdf', **kwargs)
        self.add_post_render_callback(self.generate_pdf)


class PDFTemplateView(TemplateView):
    response_class = PDFTemplateResponse

用法如下:

class MyPdfView(PDFTemplateView):
    template_name = 'things/pdf.html'

1
这对我来说几乎是直截了当的。唯一需要做的就是将 html.encode("ISO-8859-1") 替换为 html.decode("utf-8") - vinyll
我已经按照@vinyll的建议更改了代码,并且还需要在类PDFTemplateView中添加一行:content_type = "application/pdf" - normic

6

我在这个帖子中尝试了最佳答案,但它并不适用于python3.8,因此我不得不进行以下更改(适用于任何使用python3.8的人):

import io 
from xhtml2pdf import pisa
from django.http import HttpResponse
from html import escape

from django.template.loader import render_to_string

def render_to_pdf(template_src, context_dict):
    html = render_to_string(template_src, context_dict)
    result = io.BytesIO()



    pdf = pisa.pisaDocument(io.BytesIO (html.encode("utf-8")), result)
    if not pdf.err:
        return HttpResponse(result.getvalue(), content_type='application/pdf')
    return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))

由于 cgi.escape 被弃用了,我不得不将代码中的 cgi 替换为 html,并将 StringIO 替换为 io.ByteIO()。在渲染方面,我使用了 render_to_string 代替将 dict 转换为 context,后者会引发错误。


3
尝试了很多个小时后,我终于找到了解决方案: https://github.com/vierno/django-xhtml2pdf 它是 https://github.com/chrisglass/django-xhtml2pdf 的分支,提供了一个通用类视图的 mixin。我使用它的方式如下:
    # views.py
    from django_xhtml2pdf.views import PdfMixin
    class GroupPDFGenerate(PdfMixin, DetailView):
        model = PeerGroupSignIn
        template_name = 'groups/pdf.html'

    # templates/groups/pdf.html
    <html>
    <style>
    @page { your xhtml2pdf pisa PDF parameters }
    </style>
    </head>
    <body>
        <div id="header_content"> (this is defined in the style section)
            <h1>{{ peergroupsignin.this_group_title }}</h1>
            ...

在填充模板字段时,请使用您在视图中定义的模型名称(全小写)。由于这是GCBV,因此可以在urls.py中直接调用它作为“.as_view”:

    # urls.py (using url namespaces defined in the main urls.py file)
    url(
        regex=r"^(?P<pk>\d+)/generate_pdf/$",
        view=views.GroupPDFGenerate.as_view(),
        name="generate_pdf",
       ),

2
你可以使用iReport编辑器定义布局,并在Jasper Reports服务器上发布报表。发布后,你可以调用REST API获取结果。
这里是功能的测试:
from django.test import TestCase
from x_reports_jasper.models import JasperServerClient

"""
    to try integraction with jasper server through rest
"""
class TestJasperServerClient(TestCase):

    # define required objects for tests
    def setUp(self):

        # load the connection to remote server
        try:

            self.j_url = "http://127.0.0.1:8080/jasperserver"
            self.j_user = "jasperadmin"
            self.j_pass = "jasperadmin"

            self.client = JasperServerClient.create_client(self.j_url,self.j_user,self.j_pass)

        except Exception, e:
            # if errors could not execute test given prerrequisites
            raise

    # test exception when server data is invalid
    def test_login_to_invalid_address_should_raise(self):
        self.assertRaises(Exception,JasperServerClient.create_client, "http://127.0.0.1:9090/jasperserver",self.j_user,self.j_pass)

    # test execute existent report in server
    def test_get_report(self):

        r_resource_path = "/reports/<PathToPublishedReport>"
        r_format = "pdf"
        r_params = {'PARAM_TO_REPORT':"1",}

        #resource_meta = client.load_resource_metadata( rep_resource_path )

        [uuid,out_mime,out_data] = self.client.generate_report(r_resource_path,r_format,r_params)
        self.assertIsNotNone(uuid)

以下是调用实现的示例:

from django.db import models
import requests
import sys
from xml.etree import ElementTree
import logging 

# module logger definition
logger = logging.getLogger(__name__)

# Create your models here.
class JasperServerClient(models.Manager):

    def __handle_exception(self, exception_root, exception_id, exec_info ):
        type, value, traceback = exec_info
        raise JasperServerClientError(exception_root, exception_id), None, traceback

    # 01: REPORT-METADATA 
    #   get resource description to generate the report
    def __handle_report_metadata(self, rep_resourcepath):

        l_path_base_resource = "/rest/resource"
        l_path = self.j_url + l_path_base_resource
        logger.info( "metadata (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        resource_response = None
        try:
            resource_response = requests.get( "%s%s" %( l_path ,rep_resourcepath) , cookies = self.login_response.cookies)

        except Exception, e:
            self.__handle_exception(e, "REPORT_METADATA:CALL_ERROR", sys.exc_info())

        resource_response_dom = None
        try:
            # parse to dom and set parameters
            logger.debug( " - response [data=%s]"  %( resource_response.text) )
            resource_response_dom = ElementTree.fromstring(resource_response.text)

            datum = "" 
            for node in resource_response_dom.getiterator():
                datum = "%s<br />%s - %s" % (datum, node.tag, node.text)
            logger.debug( " - response [xml=%s]"  %( datum ) )

            #
            self.resource_response_payload= resource_response.text
            logger.info( "metadata (end) ")
        except Exception, e:
            logger.error( "metadata (error) [%s]" % (e))
            self.__handle_exception(e, "REPORT_METADATA:PARSE_ERROR", sys.exc_info())


    # 02: REPORT-PARAMS 
    def __add_report_params(self, metadata_text, params ):
        if(type(params) != dict):
            raise TypeError("Invalid parameters to report")
        else:
            logger.info( "add-params (begin) []" )
            #copy parameters
            l_params = {}
            for k,v in params.items():
                l_params[k]=v
            # get the payload metadata
            metadata_dom = ElementTree.fromstring(metadata_text)
            # add attributes to payload metadata
            root = metadata_dom #('report'):

            for k,v in l_params.items():
                param_dom_element = ElementTree.Element('parameter')
                param_dom_element.attrib["name"] = k
                param_dom_element.text = v
                root.append(param_dom_element)

            #
            metadata_modified_text =ElementTree.tostring(metadata_dom, encoding='utf8', method='xml')
            logger.info( "add-params (end) [payload-xml=%s]" %( metadata_modified_text )  )
            return metadata_modified_text



    # 03: REPORT-REQUEST-CALL 
    #   call to generate the report
    def __handle_report_request(self, rep_resourcepath, rep_format, rep_params):

        # add parameters
        self.resource_response_payload = self.__add_report_params(self.resource_response_payload,rep_params)

        # send report request

        l_path_base_genreport = "/rest/report"
        l_path = self.j_url + l_path_base_genreport
        logger.info( "report-request (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        genreport_response = None
        try:
            genreport_response = requests.put( "%s%s?RUN_OUTPUT_FORMAT=%s" %(l_path,rep_resourcepath,rep_format),data=self.resource_response_payload, cookies = self.login_response.cookies )
            logger.info( " - send-operation-result [value=%s]"  %( genreport_response.text) )
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:CALL_ERROR", sys.exc_info())


        # parse the uuid of the requested report
        genreport_response_dom = None

        try:
            genreport_response_dom = ElementTree.fromstring(genreport_response.text)

            for node in genreport_response_dom.findall("uuid"):
                datum = "%s" % (node.text)

            genreport_uuid = datum      

            for node in genreport_response_dom.findall("file/[@type]"):
                datum = "%s" % (node.text)
            genreport_mime = datum

            logger.info( "report-request (end) [uuid=%s,mime=%s]"  %( genreport_uuid, genreport_mime) )

            return [genreport_uuid,genreport_mime]
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:PARSE_ERROR", sys.exc_info())

    # 04: REPORT-RETRIEVE RESULTS 
    def __handle_report_reply(self, genreport_uuid ):


        l_path_base_getresult = "/rest/report"
        l_path = self.j_url + l_path_base_getresult 
        logger.info( "report-reply (begin) [uuid=%s,path=%s]"  %( genreport_uuid,l_path) )

        getresult_response = requests.get( "%s%s/%s?file=report" %(self.j_url,l_path_base_getresult,genreport_uuid),data=self.resource_response_payload, cookies = self.login_response.cookies )
        l_result_header_mime =getresult_response.headers['Content-Type']

        logger.info( "report-reply (end) [uuid=%s,mime=%s]"  %( genreport_uuid, l_result_header_mime) )
        return [l_result_header_mime, getresult_response.content]

    # public methods ---------------------------------------    

    # tries the authentication with jasperserver throug rest
    def login(self, j_url, j_user,j_pass):
        self.j_url= j_url

        l_path_base_auth = "/rest/login"
        l_path = self.j_url + l_path_base_auth

        logger.info( "login (begin) [path=%s]"  %( l_path) )

        try:
            self.login_response = requests.post(l_path , params = {
                    'j_username':j_user,
                    'j_password':j_pass
                })                  

            if( requests.codes.ok != self.login_response.status_code ):
                self.login_response.raise_for_status()

            logger.info( "login (end)" )
            return True
            # see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/

        except Exception, e:
            logger.error("login (error) [e=%s]" % e )
            self.__handle_exception(e, "LOGIN:CALL_ERROR",sys.exc_info())
            #raise

    def generate_report(self, rep_resourcepath,rep_format,rep_params):
        self.__handle_report_metadata(rep_resourcepath)
        [uuid,mime] = self.__handle_report_request(rep_resourcepath, rep_format,rep_params)
        # TODO: how to handle async?
        [out_mime,out_data] = self.__handle_report_reply(uuid)
        return [uuid,out_mime,out_data]

    @staticmethod
    def create_client(j_url, j_user, j_pass):
        client = JasperServerClient()
        login_res = client.login( j_url, j_user, j_pass )
        return client


class JasperServerClientError(Exception):

    def __init__(self,exception_root,reason_id,reason_message=None):
        super(JasperServerClientError, self).__init__(str(reason_message))
        self.code = reason_id 
        self.description = str(exception_root) + " " + str(reason_message)
    def __str__(self):
        return self.code + " " + self.description

1
我得到了从HTML模板生成PDF的代码:
    import os

    from weasyprint import HTML

    from django.template import Template, Context
    from django.http import HttpResponse 


    def generate_pdf(self, report_id):

            # Render HTML into memory and get the template firstly
            template_file_loc = os.path.join(os.path.dirname(__file__), os.pardir, 'templates', 'the_template_pdf_generator.html')
            template_contents = read_all_as_str(template_file_loc)
            render_template = Template(template_contents)

            #rendering_map is the dict for params in the template 
            render_definition = Context(rendering_map)
            render_output = render_template.render(render_definition)

            # Using Rendered HTML to generate PDF
            response = HttpResponse(content_type='application/pdf')
            response['Content-Disposition'] = 'attachment; filename=%s-%s-%s.pdf' % \
                                              ('topic-test','topic-test', '2018-05-04')
            # Generate PDF
            pdf_doc = HTML(string=render_output).render()
            pdf_doc.pages[0].height = pdf_doc.pages[0]._page_box.children[0].children[
                0].height  # Make PDF file as single page file 
            pdf_doc.write_pdf(response)
            return response

    def read_all_as_str(self, file_loc, read_method='r'):
        if file_exists(file_loc):
            handler = open(file_loc, read_method)
            contents = handler.read()
            handler.close()
            return contents
        else:
            return 'file not exist'  

1
  • 这是针对Django >=3的
  • 此代码将HTML模板转换为任何页面的pdf文件。例如:post/1/new1,post/2/new2
  • pdf文件名是url中的最后一部分。例如,对于post/2/new2,文件名为new2

首先安装xhtml2pdf

pip install xhtml2pdf

urls.py

from .views import generatePdf as GeneratePdf
from django.urls import re_path
urlpatterns = [
#...
re_path(r'^pdf/(?P<cid>[0-9]+)/(?P<value>[a-zA-Z0-9 :._-]+)/$', GeneratePdf, name='pdf'),
#...
]

views.py

from django.template.loader import get_template
from .utils import render_to_pdf
# pdf
def generatePdf(request,cid,value):
    print(cid,value)
    pdf = render_to_pdf('myappname/pdf/your.html',cid)
    return HttpResponse(pdf, content_type='application/pdf')

utils.py

from io import BytesIO #A stream implementation using an in-memory bytes buffer
                       # It inherits BufferIOBase

from django.http import HttpResponse
from django.template.loader import get_template

#pisa is a html2pdf converter using the ReportLab Toolkit,
#the HTML5lib and pyPdf.

from xhtml2pdf import pisa  
#difine render_to_pdf() function
from .models import myappname
from django.shortcuts import get_object_or_404


def render_to_pdf(template_src,cid, context_dict={}):
    template = get_template(template_src)
    node = get_object_or_404(myappname, id =cid)
    context = {'node':node}
    context_dict=context
    html  = template.render(context_dict)
    result = BytesIO()

    #This part will create the pdf.
    pdf = pisa.pisaDocument(BytesIO(html.encode("ISO-8859-1")), result)
    if not pdf.err:
        return HttpResponse(result.getvalue(), content_type='application/pdf')
    return None

结构:

myappname/
      |___views.py
      |___urls.py
      |___utils.py
      |___templates/myappname/your.html

0
如果您的HTML模板中包含上下文数据以及CSS和JS,那么您可以使用pdfjs这个好的选项。
在您的代码中,您可以像这样使用。
from django.template.loader import get_template
import pdfkit
from django.conf import settings

context={....}
template = get_template('reports/products.html')
html_string = template.render(context)
pdfkit.from_string(html_string, os.path.join(settings.BASE_DIR, "media", 'products_report-%s.pdf'%(id)))

在你的HTML中,你可以链接外部或内部的CSS和JS,这将生成最佳质量的PDF。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接