使用ReportLab处理大型PDF文档时，所需时间呈指数级增长。

Question

使用ReportLab处理大型PDF文档时，所需时间呈指数级增长。

pythondjangopdf-generationreportlabplatypus

3

我正在使用ReportLab来生成PDF报告，以下是相应的代码。问题是，在生成X页的报告需要T时间，但在生成2X页的报告时，所需时间远超过2T。由于我需要生成可能高达35000页的PDF文档，这是一个很大的麻烦。有什么方法可以解决这个问题吗？

from reportlab.platypus import TableStyle, SimpleDocTemplate, LongTable, Table
from reportlab.lib.pagesizes import letter

class JournalPDFGenerator(object):
    """
    Generates Journal PDF with ReportLab
    """

    def __init__(self, pdf_name, profile_report_id):
        self.pdf_name = pdf_name
        self.profile_report_id = profile_report_id
        self.profile_report = ProfileWatchReport.objects.get(id=self.profile_report_id)
        self.document = SimpleDocTemplate(self.pdf_name, pagesize=letter)
        self.story = []

    def get_prepared_rows(self):
        row = [your_mark_details, threat_mark_details]
        yield row

    def generate_pdf(self):
        report_table = LongTable([row for row in self.get_prepared_rows()])
        self.story.append(report_table)
        self.document.build(self.story)

- yadavankit

你是否采取了一些措施来确定代码中哪里出现了阻塞？ - Endre Both

是的，self.document.build(self.story) 占据了全部时间的 99%。 - yadavankit

2个回答

0

35k页不是常见的PDF使用情况，因此出现故障并不完全意外。以下是一些探索的想法：

可能只是机器在处理大量数据时耗尽了RAM，硬件升级可能会有所帮助。
您可以尝试将数据拆分为几个表格而不是一个大表格，以查看是否可以提高性能。
是否可能将内容暂时拆分（使用类似GhostScript的其他工具将其缝合成一个文件），或永久拆分为多个文件？
是否可能自己处理分页（例如，如果内容元素的长度是可预测的）？非常大的表格的分页可能会失控。
您可以尝试测试与LongTable不同的数据结构，以检查问题是否与特定结构相关；如果是，则可能会找到替代方案。
最后（或首先，取决于您的倾向），您可以查看相关代码和/或向ReportLab团队提出问题。

- Endre Both

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- DShost · Accepted Answer

我花了很多时间找到上述问题的原因。您可以尝试使用我的BigDataTable类，针对大数据进行了优化，而不是使用LongTable。

GIST BigDataTable faster LongTable on the big data

测试了6500行和7列：

LongTable: 总文档构建时间处理超过1小时
BigDataTable: 总文档构建时间处理约为24.2秒