使用iText库将pdf转换为pdf/a

3
我希望将文档导出为PdfAConformanceLevel.PDF_A_1B标准的文件,但是当我执行document.close时,会出现以下错误,导致生成的pdf文件无法使用。
我使用以下itext版本:
        <artifactId>itextpdf</artifactId>
        <version>5.5.9</version>

        <artifactId>itext-pdfa</artifactId>
        <version>5.5.9</version>

堆栈跟踪:

com.itextpdf.text.pdf.PdfAConformanceException: Real number is out of range.
at com.itextpdf.text.pdf.internal.PdfA1Checker.checkPdfObject(PdfA1Checker.java:259)
at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:208)
at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:71)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3480)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3476)
at com.itextpdf.text.pdf.PdfObject.toPdf(PdfObject.java:174)
at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:175)
at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
at com.itextpdf.text.pdf.PdfStream.superToPdf(PdfStream.java:278)
at com.itextpdf.text.pdf.PRStream.toPdf(PRStream.java:239)
at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:377)
at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:872)
at com.itextpdf.text.pdf.PdfReaderInstance.writeAllVisited(PdfReaderInstance.java:161)
at com.itextpdf.text.pdf.PdfReaderInstance.writeAllPages(PdfReaderInstance.java:177)
at com.itextpdf.text.pdf.PdfWriter.addSharedObjectsToBody(PdfWriter.java:1380)
at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1264)
at com.itextpdf.text.pdf.PdfAWriter.close(PdfAWriter.java:337)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:889)
at com.itextpdf.text.Document.close(Document.java:416)
at si.telekom.erender.ERenderImpl.mergeContentOfItems(ERenderImpl.java:2911)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.xml.ws.api.server.MethodUtil.invoke(MethodUtil.java:83)
at com.sun.xml.ws.api.server.InstanceResolver$1.invoke(InstanceResolver.java:250)
at com.sun.xml.ws.server.InvokerTube$2.invoke(InvokerTube.java:149)
at com.sun.xml.ws.server.sei.SEIInvokerTube.processRequest(SEIInvokerTube.java:88)
at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:1136)
at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:1050)
at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:1019)
at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:877)
at com.sun.xml.ws.server.WSEndpointImpl$2.process(WSEndpointImpl.java:419)
at com.sun.xml.ws.transport.http.HttpAdapter$HttpToolkit.handle(HttpAdapter.java:868)
at com.sun.xml.ws.transport.http.HttpAdapter.handle(HttpAdapter.java:422)
at com.sun.xml.ws.transport.http.servlet.ServletAdapter.invokeAsync(ServletAdapter.java:225)
at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doGet(WSServletDelegate.java:161)
at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doPost(WSServletDelegate.java:197)
at com.sun.xml.ws.transport.http.servlet.WSServlet.doPost(WSServlet.java:81)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

我正在使用以下代码生成PDF文件:

public byte[] mergeContentOfItems(List<MergeItem> items) throws ErenderException {
    MessageContext mc = wsCtx.getMessageContext();
    HttpServletRequest req = (HttpServletRequest) mc.get(MessageContext.SERVLET_REQUEST);
    getLogger().info("Webservice method 'mergeContentOfItems' called from IP:" + req.getRemoteAddr());
    if (items.size() < 1) {
        String errDescription = "No barcodes specified!";
        throw new ErenderException(errDescription, new ErenderExceptionBean("201", errDescription),
                new Throwable(errDescription));
    }

    com.itextpdf.text.Document document = new com.itextpdf.text.Document();
    ByteArrayOutputStream baOs = new ByteArrayOutputStream();

    PdfWriter writer = null;
    List<PdfReader> readers = new ArrayList<PdfReader>();
    int totalPages = 0;

    try {
        // Create a writer for the outputstream
        writer = PdfAWriter.getInstance(document, baOs, PdfAConformanceLevel.PDF_A_1B);
        writer.setPdfVersion(PdfWriter.PDF_VERSION_1_4);
        writer.createXmpMetadata();

        //writer = PdfWriter.getInstance(document, baOs);

        document.open();

        ICC_Profile icc = ICC_Profile
                .getInstance(Thread.currentThread().getContextClassLoader().getResourceAsStream("srgb.profile"));
        writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
        PdfContentByte cb = writer.getDirectContent(); // Holds the PDF

        for (int i = 0; i < items.size(); i++) {
            String pdfFileName = null;
            File urlTempFile = null;
            if (items.get(i).getBarcode() != null) {
                Template tmpl = TemplatesSynchronizer.getTemplateByBarcode(items.get(i).getBarcode());
                String fileName = tmpl.getName();
                pdfFileName = fileName.substring(0, fileName.indexOf(".")) + ".pdf";
                getLogger().info("\tworking on:" + items.get(i) + " fileName:" + pdfFileName);
                if (!new File(pdfFileName).exists()) {
                    String msg = String.format("Datoteka %s ne obstaja", pdfFileName);
                    throw new ErenderException("Error", new ErenderExceptionBean("109", msg, new Exception(msg)));
                }

            } else if (items.get(i).getUrl() != null) {
                urlTempFile = File.createTempFile("myTemp", "pdf");
                FileUtils.copyURLToFile(new URL(items.get(i).getUrl()), urlTempFile);
            }

            if (pdfFileName != null || urlTempFile != null) {
                PdfReader pdfReader = null;
                if (pdfFileName != null)
                    pdfReader = new PdfReader(pdfFileName);
                else if (urlTempFile != null)
                    pdfReader = new PdfReader(urlTempFile.getAbsolutePath());

                if (pdfReader != null) {
                    // Create Readers for the pdfs.
                    readers.add(pdfReader);
                    totalPages += pdfReader.getNumberOfPages();

                    int pageOfCurrentReaderPDF = 0;
                    while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
                        document.newPage();
                        pageOfCurrentReaderPDF++;
                        PdfImportedPage page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF);
                        document.setPageSize(pdfReader.getPageSizeWithRotation(pageOfCurrentReaderPDF));
                        document.newPage();
                        cb.addTemplate(page, 0, 0);
                    }
                }
                if (urlTempFile != null)
                    urlTempFile.delete();
            }
        }

    } catch (Throwable ex) {
        StringWriter errorStringWriter = new StringWriter();
        PrintWriter pw = new PrintWriter(errorStringWriter);
        ex.printStackTrace(pw);
        Logger.getLogger(this.getClass()).error(errorStringWriter.getBuffer().toString());
        throw new ErenderException("Error", new ErenderExceptionBean("109", "Napaka v merge metodi.",ex), ex);

    } finally {

        if (document != null && document.isOpen())
            try {
                document.close();
            } catch (Exception ex) {
                StringWriter errorStringWriter = new StringWriter();
                PrintWriter pw = new PrintWriter(errorStringWriter);
                ex.printStackTrace(pw);
                Logger.getLogger(this.getClass()).error(errorStringWriter.getBuffer().toString());


                getLogger().error("Unable to close document.\n" + errorStringWriter);
            }

        if (writer != null && writer.isCloseStream()) {
            try {
                writer.flush();
                writer.close();
            } catch (Exception ex) {
                getLogger().error("Unable to flush or close writer");
            }
        }

        try {
            baOs.flush();
            baOs.close();
        } catch (Exception ex) {
            getLogger().error("Unable to close baOs in mergeContent method.");
        }
    }
    getLogger().info("Webservice method 'mergeContent' called from IP:" + req.getRemoteAddr() + " ended. " + totalPages
            + " merged.");
    return baOs.toByteArray();

}

由于其他文件没有错误,这似乎是特定于输入文件的 - 这里有一个文件可以重现错误: 我正在尝试转换此输入pdf文件: http://filebin.ca/2hR2xO1SNlzh/09062009073008005.pdf


你能把那个堆栈跟踪放到代码块中吗?这样就更容易阅读了。 - Amedee Van Gasse
你忘了提到它是特定于输入的,并且你在其他文件中没有这个。 - Amedee Van Gasse
1
好的,我已经添加了那些信息。 - zhivko
1个回答

7
首先,iText不会将普通PDF文档转换为PDF/A文档。我们的客户使用iText进行此操作,但是他们的代码比您的复杂得多。
iText不会将普通PDF文档转换为PDF/A的原因很明显:普通PDF可能没有PDF/A所需的所有必要功能。您可能有一个PDF文件,其中字体未嵌入。在这种情况下,需要提供适当的字体程序。iText不附带任何字体程序,因此使用iText的软件必须提供此内容。
在您的代码中,您只是复制内容流而不检查可能导致最终结果不符合PDF/A的任何问题。您应该非常小心处理生成的PDF文件。它们将显示文件声称为PDF/A的蓝色条,但这并不意味着通过验证器时该文件将被验证为PDF。
现在来说说您的问题。您想将普通PDF转换为PDF/A-1。 PDF/A-1基于2001年的PDF 1.4。这意味着您不能使用2001年之后引入的任何新功能。在PDF 1.4中,对象编号存在限制。 PDF中的对象编号不能超过32,767。此限制在PDF 1.5中被移除。
我猜测您描述的问题是由于尝试创建具有超过PDF 1.4允许的对象数量的PDF 1.4造成的。可能有两个原因:
1.您的原始PDF是PDF 1.5或更高版本, 2.您对PDF的操作需要超过最大可用对象数。
这可以通过生成PDF/A-2来解决,而不是PDF/A-1,但我相信您很快会遇到其他限制(例如缺少字体和创建声称为PDF但实际上不是的文件引起的其他问题)。当您尝试执行明显错误的操作时,PdfAWriter将引发异常,但不能保证是否存在一些更微妙的PDF/A要求被忽略。

1
如果已经存在 PDF 文件,并且应用程序要求仅提供 PDF/A-2 文件,那么解决方案将是什么? - S_S

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接