Apache PDFBox和PDF/A-3

Question

Apache PDFBox和PDF/A-3

pdfpdfboxpdfa

3

是否可以使用Apache PDFBox处理PDF/A-3文档？（特别是更改字段值？）

PDFBox 1.8 Cookbook表示可以使用pdfaid.setPart(1)创建PDF/A-1文档。

我能否将pdfaid.setPart(3)应用于PDF/A-3文档？
如果不行：是否可以读取PDF/A-3文档，更改一些字段值并保存它，而不需要进行>创建/转换为PDF/A-3<，但文档仍然是PDF/A-3？

- hagem

1

您的问题已经在PDFBox用户邮件列表中得到了正确（而且非常友好）的回答。 - Tilman Hausherr

太好了，谢谢！我已经在下面引用了那个答案。 - hagem

2个回答

5

如何创建符合 PDF/A {2,3} - {B, U, A} 标准的文档：在本例中，我将 PDF 转换为图像，然后使用该图像创建一个符合标准的 PDF/Ax-y 文档。使用 PDFBOX2.0x 工具可实现此功能。

public static void main(String[] args) throws IOException, TransformerException
{

    String resultFile = "result/PDFA-x.PDF";  
    FileInputStream in = new FileInputStream("src/PDFOrigin.PDF");

    PDDocument doc = new PDDocument();
    try 
    {
        PDPage page = new PDPage();
        doc.addPage(page); 
        doc.setVersion(1.7f);

        /*             
        // A PDF/A file needs to have the font embedded if the font is used for text rendering
        // in rendering modes other than text rendering mode 3.
        //
        // This requirement includes the PDF standard fonts, so don't use their static PDFType1Font classes such as
        // PDFType1Font.HELVETICA.
        //
        // As there are many different font licenses it is up to the developer to check if the license terms for the
        // font loaded allows embedding in the PDF.

        String fontfile = "/org/apache/pdfbox/resources/ttf/ArialMT.ttf"; 
        PDFont font = PDType0Font.load(doc, new File(fontfile));           
        if (!font.isEmbedded())
        {
            throw new IllegalStateException("PDF/A compliance requires that all fonts used for"
                    + " text rendering in rendering modes other than rendering mode 3 are embedded.");
        }
      */ 

        PDPageContentStream contents = new PDPageContentStream(doc, page);
        try 
        {   
            PDDocument docSource = PDDocument.load(in);
            PDFRenderer pdfRenderer = new PDFRenderer(docSource);               
            int numPage = 0;

            BufferedImage imagePage = pdfRenderer.renderImageWithDPI(numPage, 200); 
            PDImageXObject pdfXOImage = LosslessFactory.createFromImage(doc, imagePage);

            contents.drawImage(pdfXOImage, 0,0, page.getMediaBox().getWidth(), page.getMediaBox().getHeight());
            contents.close();   

        }catch (Exception e) {
            // TODO: handle exception
        }

        // add XMP metadata
        XMPMetadata xmp = XMPMetadata.createXMPMetadata();
        PDDocumentCatalog catalogue = doc.getDocumentCatalog();
        Calendar cal =  Calendar.getInstance();          

        try
        {
            DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
           // dc.setTitle(file);
            dc.addCreator("My APPLICATION Creator");
            dc.addDate(cal);

            PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
            id.setPart(3);  //value => 2|3
            id.setConformance("A"); // value => A|B|U

            XmpSerializer serializer = new XmpSerializer();
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            serializer.serialize(xmp, baos, true);

            PDMetadata metadata = new PDMetadata(doc);
            metadata.importXMPMetadata(baos.toByteArray());                
            catalogue.setMetadata(metadata);
        }
        catch(BadFieldValueException e)
        {
            throw new IllegalArgumentException(e);
        }

        // sRGB output intent
        InputStream colorProfile = CreatePDFA.class.getResourceAsStream(
                "../../../pdmodel/sRGB.icc");
        PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
        intent.setInfo("sRGB IEC61966-2.1");
        intent.setOutputCondition("sRGB IEC61966-2.1");
        intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
        intent.setRegistryName("http://www.color.org");

        catalogue.addOutputIntent(intent);  
        catalogue.setLanguage("en-US");

        PDViewerPreferences pdViewer =new PDViewerPreferences(page.getCOSObject());
        pdViewer.setDisplayDocTitle(true);; 
        catalogue.setViewerPreferences(pdViewer);

        PDMarkInfo  mark = new PDMarkInfo(); // new PDMarkInfo(page.getCOSObject()); 
        PDStructureTreeRoot treeRoot = new PDStructureTreeRoot(); 
        catalogue.setMarkInfo(mark);
        catalogue.setStructureTreeRoot(treeRoot);           
        catalogue.getMarkInfo().setMarked(true);

        PDDocumentInformation info = doc.getDocumentInformation();               
        info.setCreationDate(cal);
        info.setModificationDate(cal);            
        info.setAuthor("My APPLICATION Author");
        info.setProducer("My APPLICATION Producer");;
        info.setCreator("My APPLICATION Creator");
        info.setTitle("PDF title");
        info.setSubject("PDF to PDF/A{2,3}-{A,U,B}");           

        doc.save(resultFile);
    }catch (Exception e) {
        throw new IllegalArgumentException(e);
    }
}

- Madal Africa-Guinea

答案可能是可以的（我必须通过验证器运行代码才能确定）；但解压缩jpeg文件是低效的。请改用JPEGFactory.createFromStream()。这将使用jpg文件本身。更改代码以避免所有复制和粘贴，让人们使用该部分会很好。如果您仍想解码Jpeg以获取BufferedImage，则只需要一行：ImageIO.read()。您的多行要么过时，要么非常新 :-) - Tilman Hausherr

这里的目标不是解压缩JPEG :). 否则，您可以使用PDFRenderer.renderImageWithDPI(...)和PDFBOX直接从PDF页面生成BufferedImage。另一方面，结果已经通过pdf-online验证。 - Madal Africa-Guinea

我知道目标是生成PDF文件。我的评论是关于PDF中的图像。您使用LosslessFactory与jpeg文件会使其变慢（因为它将解压缩jpeg并使用Flate压缩重新压缩），并且通常会产生比使用JPEGFactory和流输入更大的PDF文件。 - Tilman Hausherr

我完全同意您的观点。我在一个图像压缩/解压缩不同格式的项目中提取了这段代码。我想让它简单易懂，但是发帖后才意识到，更好理解的方法是直接使用 PDFRenderer.renderImageWithDPI (numPage, dpi, ..) 提取源PDF页面的图像（BufferedImage）。感谢您在测试结果PDF/A3-A的有效性时通知我。 - Madal Africa-Guinea

最新版本的PdfBox中有一些方法不存在。请查看以下链接以了解如何修复这些问题：https://www.programcreek.com/java-api-examples/?api=org.apache.pdfbox.cos.COSDocument - Buminda

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hagem · Accepted Answer

PDFBox支持此功能，但请注意，由于PDFBox是低级库，您必须自行确保符合性，即没有“另存为PDF/A-3”选项。您可能需要查看http://www.mustangproject.org，该网站使用PDFBox来支持ZUGFeRD（电子发票），该发票还需要PDF/A-3格式。