PDFBox 2.0 RC3 -- 查找和替换文本

Question

PDFBox 2.0 RC3 -- 查找和替换文本

javapdfbox

5

如何使用PDFBox 2.0在PDF文档中查找和替换文本？他们删除了旧的示例，它的语法不再适用，所以我想知道是否仍然有可能，并且如果有的话，最佳方法是什么。谢谢！

- Shaun

4

那个老例子只在非常简单的PDF文件中有效，而且不能改变或（更糟的是）损坏更复杂的文件。 - mkl

https://github.com/chadilukito/Apache-PdfBox-2-Examples/blob/master/ReplaceText.java - Hrvoje

2个回答

5

我花了很多时间来解决这个问题，最终购买了Acrobat DC订阅，以便我可以创建字段作为文本的占位符进行替换。在我的情况下，这些字段用于客户信息和订单详细信息，因此数据并不是非常复杂，但文档中充满了商业相关条件，并且具有非常复杂的布局。

然后我只需简单地执行此操作，这可能适合您。

private void update() throws InvalidPasswordException, IOException {
    Map<String, String> map = new HashMap<>();
    map.put("fieldname", "value to update");
    File template = new File("template.pdf");
    PDDocument document = PDDocument.load(template);
    List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
    for (PDField field : fields) {
        for (Map.Entry<String, String> entry : map.entrySet()) {
            if (entry.getKey().equals(field.getFullyQualifiedName())) {
                field.setValue(entry.getValue());
                field.setReadOnly(true);
            }
        }
    }
    File out = new File("out.pdf");
    document.save(out);
    document.close();
}

YMMV

- Tim Coy

4

确实，使用AcroForm字段是PDF填写应该完成的方式。但是您不需要Acrobat来创建字段，您也可以使用PDFBox完成此操作...（虽然没有美观的GUI界面）。 - mkl

1

谢谢@mkl，我确实意识到可以使用pdfbox创建字段，但我无法弄清楚如何将它们放置在文档中恰好需要的位置。 - Tim Coy

有什么安全的方法可以替换每个页脚/页眉中的文本吗？如果我在页脚中放置一个字段，它将只有一个字段而不是重复（仅显示一个字段“pdftk the.pdf dump_data_fields”）。 - jcalfee314

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mourphy · Accepted Answer

你可以尝试这样做：

public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
    if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
        return document;
    }
    PDPageTree pages = document.getDocumentCatalog().getPages();
    for (PDPage page : pages) {
        PDFStreamParser parser = new PDFStreamParser(page);
        parser.parse();
        List tokens = parser.getTokens();
        for (int j = 0; j < tokens.size(); j++) {
            Object next = tokens.get(j);
            if (next instanceof Operator) {
                Operator op = (Operator) next;
                //Tj and TJ are the two operators that display strings in a PDF
                if (op.getName().equals("Tj")) {
                    // Tj takes one operator and that is the string to display so lets update that operator
                    COSString previous = (COSString) tokens.get(j - 1);
                    String string = previous.getString();
                    string = string.replaceFirst(searchString, replacement);
                    previous.setValue(string.getBytes());
                } else if (op.getName().equals("TJ")) {
                    COSArray previous = (COSArray) tokens.get(j - 1);
                    for (int k = 0; k < previous.size(); k++) {
                        Object arrElement = previous.getObject(k);
                        if (arrElement instanceof COSString) {
                            COSString cosString = (COSString) arrElement;
                            String string = cosString.getString();
                            string = StringUtils.replaceOnce(string, searchString, replacement);
                            cosString.setValue(string.getBytes());
                        }
                    }
                }
            }
        }
        // now that the tokens are updated we will replace the page content stream.
        PDStream updatedStream = new PDStream(document);
        OutputStream out = updatedStream.createOutputStream();
        ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
        tokenWriter.writeTokens(tokens);
        page.setContents(updatedStream);
        out.close();
    }
    return document;
}