Itext HtmlConverter.convertToPdf 在将html转换为pdf时速度较慢。

5

我有一个实时将HTML转换为PDF的需求。为此,我正在使用IText。

        PdfDocument inputDoc = new PdfDocument(new PdfWriter(byteArrayOutputStream))) {
      String html = "<html><head><Title> My pdf</Title></head></html>";
      inputDoc.addEventHandler(PdfDocumentEvent.START_PAGE, new PdfHeaderHandler(pageNumber,
          pageCount));
      inputDoc.addEventHandler(PdfDocumentEvent.END_PAGE, new PdfFooterHandler(pageNumber,
          pageCount));

      HtmlConverter.convertToPdf(new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)), inputDoc,
          converterProperties);
      return new PdfDocument(
          new PdfReader(new ByteArrayInputStream(byteArrayOutputStream.toByteArray())));
    } catch (IOException exception) {
      getLogger().error("Html to Pdf conversion failed for page {} of {} due to error {}", pageNumber, pageCount,
          exception.getMessage());
    } 

现在的问题是,如果页面数量在30-50范围内,即使我每次使用20个线程并行转换每个HTML页面,也会出现巨大的周转时间,每页范围在5-10秒。

以下是一个具有19个页面和20个线程的示例日志:


2020-06-19 14:54:12.730 [PdfGenerationThreadPoolExecutor-12] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 3
2020-06-19 14:54:12.731 [PdfGenerationThreadPoolExecutor-5] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 2
2020-06-19 14:54:12.736 [PdfGenerationThreadPoolExecutor-6] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 4
2020-06-19 14:54:12.754 [PdfGenerationThreadPoolExecutor-19] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 5
2020-06-19 14:54:12.793 [PdfGenerationThreadPoolExecutor-15] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 7
2020-06-19 14:54:12.793 [PdfGenerationThreadPoolExecutor-10] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 6
2020-06-19 14:54:12.798 [PdfGenerationThreadPoolExecutor-14] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 9
2020-06-19 14:54:12.798 [PdfGenerationThreadPoolExecutor-16] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 8
2020-06-19 14:54:12.798 [PdfGenerationThreadPoolExecutor-7] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 10
2020-06-19 14:54:12.802 [PdfGenerationThreadPoolExecutor-4] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 11
2020-06-19 14:54:12.805 [PdfGenerationThreadPoolExecutor-17] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 12
2020-06-19 14:54:12.807 [PdfGenerationThreadPoolExecutor-8] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 13
2020-06-19 14:54:12.808 [PdfGenerationThreadPoolExecutor-3] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 14
2020-06-19 14:54:12.811 [PdfGenerationThreadPoolExecutor-1] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 15
2020-06-19 14:54:12.813 [PdfGenerationThreadPoolExecutor-2] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 16
2020-06-19 14:54:12.815 [PdfGenerationThreadPoolExecutor-9] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 17
2020-06-19 14:54:12.817 [PdfGenerationThreadPoolExecutor-11] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 18
2020-06-19 14:54:12.819 [PdfGenerationThreadPoolExecutor-13] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 19
2020-06-19 14:54:12.820 [qtp403879268-30] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Waiting for futures to complete
2020-06-19 14:54:12.830 [PdfGenerationThreadPoolExecutor-20] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Converting PDF for page 1
2020-06-19 14:54:20.398 [PdfGenerationThreadPoolExecutor-6] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 4
2020-06-19 14:54:20.416 [PdfGenerationThreadPoolExecutor-11] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 18
2020-06-19 14:54:20.428 [PdfGenerationThreadPoolExecutor-13] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 19
2020-06-19 14:54:20.458 [PdfGenerationThreadPoolExecutor-19] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 5
2020-06-19 14:54:20.488 [PdfGenerationThreadPoolExecutor-12] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 3
2020-06-19 14:54:20.633 [PdfGenerationThreadPoolExecutor-4] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 11
2020-06-19 14:54:20.802 [PdfGenerationThreadPoolExecutor-14] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 9
2020-06-19 14:54:20.905 [PdfGenerationThreadPoolExecutor-8] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 13
2020-06-19 14:54:20.913 [PdfGenerationThreadPoolExecutor-17] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 12
2020-06-19 14:54:21.095 [PdfGenerationThreadPoolExecutor-7] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 10
2020-06-19 14:54:21.144 [PdfGenerationThreadPoolExecutor-10] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 6
2020-06-19 14:54:21.244 [PdfGenerationThreadPoolExecutor-15] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 7
2020-06-19 14:54:21.293 [PdfGenerationThreadPoolExecutor-20] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 1
2020-06-19 14:54:21.327 [PdfGenerationThreadPoolExecutor-1] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 15
2020-06-19 14:54:21.329 [PdfGenerationThreadPoolExecutor-16] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 8
2020-06-19 14:54:21.335 [PdfGenerationThreadPoolExecutor-3] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 14
2020-06-19 14:54:21.360 [PdfGenerationThreadPoolExecutor-9] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 17
2020-06-19 14:54:21.384 [PdfGenerationThreadPoolExecutor-2] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 16
2020-06-19 14:54:21.404 [PdfGenerationThreadPoolExecutor-5] INFO  c.p.s.a.s.d.s.i.PdfGenerationService - Done converting PDF for page 2

如果您注意到每个页面需要大约8秒钟的时间,我们希望显著缩短这个时间。

有人能否建议任何改进或任何可帮助我们的替代库。

提前感谢。


Antariksh,你的日志显示使用了“Futures”,但你的代码没有。你能展示一下如何使用“Futures”调用你的代码吗? - mkl
Ali Bdeir,你真的对使用HtmlConverterFutures并行的这种特定方式感兴趣吗? - mkl
@mkl 问题在于HtmlConverter.ToPdf非常低效,每页需要近10秒的时间。没有更多,也没有更少。 - Ali Bdeir
@AliBdeir:“每页需要近10秒钟” - 如果我正确地阅读了问题文本中的日志文件摘录,Antariksh的并行测试代码在10秒内生成了20个PDF页面。 - mkl
请说明您正在使用的iText版本。 - cyberbrain
显示剩余2条评论
1个回答

0

如果你愿意用Java封装CLI工具,你可以使用wkhtmltopdf,它是用C++编写的,所以应该更快。

还有可用的预先封装版本

除此之外,在Java领域中,我认为没有太多速度方面的竞争对手。

OpenPDF只是IText的一个分支,性能应该类似。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接