PDFBox 2不能创建PDF/A文件。

3
我正在尝试使用PDFBox 2创建一个PDF/A文件。我的代码基于这里的示例代码。代码运行没有错误。但是,如果我使用callas pdfPilot和veraPDF验证文件,则没有XMP元数据和PDF/A版本信息。此外,PDF文件的版本是1.4,而不是代码中设置的1.7。
// TTF font needed for Unicode support in OCR texts
PDFont font = PDType0Font.load(document,
    PDDocument.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"), true);

// Add metadata (needed by PDF/A)
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
try {
    DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
    dc.setTitle("THE DOCUMENT TITLE");
    dc.addCreator("THE AUTHOR");

    PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
    id.setPart(2);
    id.setConformance("B");

    XmpSerializer serializer = new XmpSerializer();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    serializer.serialize(xmp, baos, true);

    PDMetadata metadata = new PDMetadata(document);
    metadata.importXMPMetadata(baos.toByteArray());
    document.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException e) {
    throw new IllegalArgumentException("", e);
}

// Set color profile (needed by PDF/A)
InputStream colorProfile = PDDocument.class.getResourceAsStream("/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(document, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
document.getDocumentCatalog().addOutputIntent(intent);

// Render all pages
for (IPage page : pages) {
    ((PdfboxPage)page).setFont(font);
    page.renderPage(this);
    document.addPage((PDPage) page.getPage());
}

document.setVersion(1.7f);
document.save(path);
document.close();

我做错了什么?
编辑1:
我可以看到PDF文件中有xpacket。它包含元数据。但是看起来PDFBox没有以有效的方式写入这些数据(对于veraPDF和pdfPilot)。
编辑2:
看起来PDFBox 2.0.12生成无效的PDF/A文件。我使用我们的商业pdfPilot程序转换了PDF文件。(PDF/A-1b)
PDFBox将此写入PDF文件中(->在veraPDF和pdfPilot中无效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
         <dc:title>
            <rdf:Alt>
               <rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
            </rdf:Alt>
         </dc:title>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>THE AUTHOR</rdf:li>
            </rdf:Seq>
         </dc:creator>
      </rdf:Description>
      <rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
         <pdfaid:part>1</pdfaid:part>
         <pdfaid:conformance>B</pdfaid:conformance>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
pdfPilot将此写入PDF文件中(-> 在veraPDF和pdfPilot中有效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.159809, 2016/11/11-01:42:16        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
            xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
            xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
            xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
            xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
         <dc:format>application/pdf</dc:format>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>AUTOR</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">TITEL</rdf:li>
            </rdf:Alt>
         </dc:title>
         <xmp:ModifyDate>2019-01-11T11:42:22+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2019-01-11T11:42:21+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2019-01-11T11:42:22+01:00</xmp:MetadataDate>
         <xmpMM:DocumentID>uuid:b60f88c2-aa89-11b2-0a00-104bbf060000</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:b61148b9-aa89-11b2-0a00-60d9faa0ff7f</xmpMM:InstanceID>
         <xmpMM:RenditionClass>default</xmpMM:RenditionClass>
         <xmpMM:VersionID>1</xmpMM:VersionID>
         <xmpMM:History>
            <rdf:Seq>
               <rdf:li rdf:parseType="Resource">
                  <stEvt:action>converted</stEvt:action>
                  <stEvt:instanceID>uuid:b60f88c3-aa89-11b2-0a00-902dfba0ff7f</stEvt:instanceID>
                  <stEvt:parameters>converted to PDF/A-1b</stEvt:parameters>
                  <stEvt:softwareAgent>pdfaPilot</stEvt:softwareAgent>
                  <stEvt:when>2019-01-11T11:42:22+01:00</stEvt:when>
               </rdf:li>
            </rdf:Seq>
         </xmpMM:History>
         <pdfaid:part>1</pdfaid:part>
         <pdfaid:conformance>B</pdfaid:conformance>
         <pdfaExtension:schemas>
            <rdf:Bag>
               <rdf:li rdf:parseType="Resource">
                  <pdfaSchema:namespaceURI>http://ns.adobe.com/xap/1.0/mm/</pdfaSchema:namespaceURI>
                  <pdfaSchema:prefix>xmpMM</pdfaSchema:prefix>
                  <pdfaSchema:schema>XMP Media Management Schema</pdfaSchema:schema>
                  <pdfaSchema:property>
                     <rdf:Seq>
                        <rdf:li rdf:parseType="Resource">
                           <pdfaProperty:category>internal</pdfaProperty:category>
                           <pdfaProperty:description>UUID based identifier for specific incarnation of a document</pdfaProperty:description>
                           <pdfaProperty:name>InstanceID</pdfaProperty:name>
                           <pdfaProperty:valueType>URI</pdfaProperty:valueType>
                        </rdf:li>
                        <rdf:li rdf:parseType="Resource">
                           <pdfaProperty:category>internal</pdfaProperty:category>
                           <pdfaProperty:description>The common identifier for all versions and renditions of a document.</pdfaProperty:description>
                           <pdfaProperty:name>OriginalDocumentID</pdfaProperty:name>
                           <pdfaProperty:valueType>URI</pdfaProperty:valueType>
                        </rdf:li>
                     </rdf:Seq>
                  </pdfaSchema:property>
               </rdf:li>
               <rdf:li rdf:parseType="Resource">
                  <pdfaSchema:namespaceURI>http://www.aiim.org/pdfa/ns/id/</pdfaSchema:namespaceURI>
                  <pdfaSchema:prefix>pdfaid</pdfaSchema:prefix>
                  <pdfaSchema:schema>PDF/A ID Schema</pdfaSchema:schema>
                  <pdfaSchema:property>
                     <rdf:Seq>
                        <rdf:li rdf:parseType="Resource">
                           <pdfaProperty:category>internal</pdfaProperty:category>
                           <pdfaProperty:description>Part of PDF/A standard</pdfaProperty:description>
                           <pdfaProperty:name>part</pdfaProperty:name>
                           <pdfaProperty:valueType>Integer</pdfaProperty:valueType>
                        </rdf:li>
                        <rdf:li rdf:parseType="Resource">
                           <pdfaProperty:category>internal</pdfaProperty:category>
                           <pdfaProperty:description>Amendment of PDF/A standard</pdfaProperty:description>
                           <pdfaProperty:name>amd</pdfaProperty:name>
                           <pdfaProperty:valueType>Text</pdfaProperty:valueType>
                        </rdf:li>
                        <rdf:li rdf:parseType="Resource">
                           <pdfaProperty:category>internal</pdfaProperty:category>
                           <pdfaProperty:description>Conformance level of PDF/A standard</pdfaProperty:description>
                           <pdfaProperty:name>conformance</pdfaProperty:name>
                           <pdfaProperty:valueType>Text</pdfaProperty:valueType>
                        </rdf:li>
                     </rdf:Seq>
                  </pdfaSchema:property>
               </rdf:li>
            </rdf:Bag>
         </pdfaExtension:schemas>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

如果我将此静态地写入PDF文件,则会生成一个有效的PDF/A文件:
String xmpData = "<?xpacket ......";
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(xmpData.getBytes());

编辑3:
添加以下内容是有效且简短的:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" >
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
            <dc:format>application/pdf</dc:format>
            <dc:creator>
            <rdf:Seq>
                <rdf:li>AUTOR</rdf:li>
            </rdf:Seq>
            </dc:creator>
            <dc:title>
            <rdf:Alt>
                <rdf:li xml:lang="x-default">TITEL</rdf:li>
            </rdf:Alt>
            </dc:title>
            <pdfaid:part>1</pdfaid:part>
            <pdfaid:conformance>B</pdfaid:conformance>
        </rdf:Description>
    </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

请阅读此链接:https://www.mail-archive.com/users@pdfbox.apache.org/msg09256.html - undefined
好的,这个有效!很遗憾这个类没有注入工厂实例的可能性。谢谢。如果你想获得一些声誉,你可以写个回答。 - undefined
1个回答

2
这里有一个区别,即CreatePDFA示例生成的XML。
<rdf:li xml:lang="x-default">THE DOCUMENT TITLE</rdf:li> 

你拥有什么

<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>

"我可以帮你翻译这段与编程相关的内容。需要润色,使其更加通俗易懂。请保留HTML标签,并按照格式要求返回翻译结果。以下是需要翻译的内容:

这让我想起了1年半前我们遇到的一个问题,当时在这里进行了讨论。

因此,引用我2017年的答案中的话:

"
Transformer transformer = TransformerFactory.newInstance().newTransformer();

应该返回一个com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl类。如果没有,则调用。
Transformer transformer =
    TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null).newTransformer(); 

或者设置一个系统属性:
System.setProperty("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");

我无法回答(因为你没有说明)你是如何得到这个转换器的,如果你更改它,你的应用程序的其余部分会发生什么也不清楚。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接