在iText的JavaScript动作中搜索特定字符串的PDF

Question

在iText的JavaScript动作中搜索特定字符串的PDF

6

我的目标是在PDF注释中查找给定模式的JavaScript。为此，我编写了以下代码：

public static void main(String[] args) {

        try {

            // Reads and parses a PDF document
            PdfReader reader = new PdfReader("Test.pdf");

            // For each PDF page
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {

                // Get a page a PDF page
                PdfDictionary page = reader.getPageN(i);
                // Get all the annotations of page i
                PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

                // If page does not have annotations
                if (page.getAsArray(PdfName.ANNOTS) == null) {
                    continue;
                }

                // For each annotation
                for (int j = 0; j < annotsArray.size(); ++j) {

                    // For current annotation
                    PdfDictionary curAnnot = annotsArray.getAsDict(j);

                    // check if has JS as described below
                 PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
                 // test if it is a JavaScript action
                 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
                 // what here?
                 }


                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

据我所知，比较字符串是通过StringCompare库完成的。但问题是它只会比较两个字符串，而我想知道注释中的JavaScript操作是否以（或包含）以下字符串开头：if (this.hostContainer) { try { 那么，如何检查注释中的JavaScript是否包含上述字符串呢？编辑包含JS示例页面：pdf with JS。

- menteith

在确定该操作为 JavaScript 操作后，您可以直接检查 JS 字符串或双值。那么问题是什么？ - mkl

那么，我该如何检查JS字符串？ - menteith

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mkl · Accepted Answer

ISO 32000-1中对JavaScript操作的定义如下：

12.6.4.16 JavaScript Actions

执行JavaScript操作时，符合规范的处理器应执行使用JavaScript编程语言编写的脚本。根据脚本的性质，文档中的各种交互式表单字段可能会更新其值或更改其视觉外观。Mozilla Development Center的客户端JavaScript参考和Adobe Acrobat API JavaScript参考（见参考书目）详细介绍了JavaScript脚本的内容和效果。表217显示了特定于此类型操作的动作字典条目。

表217 - 特定于JavaScript操作的其他条目

键类型值

S 名称 (必需)描述该字典的动作类型; 对于JavaScript操作，应为JavaScript。

JS 文本字符串或文本流 (必需)包含要执行的JavaScript脚本的文本字符串或文本流。PDFDocEncoding或Unicode编码（后者由Unicode前缀U + FEFF标识）应用于编码字符串或流的内容。

为支持在JavaScript脚本中使用参数化函数调用，PDF文档的名称字典中的JavaScript条目（请参见7.7.4，“名称字典”）可能包含将名称字符串映射到文档级别JavaScript操作的名称树。打开文档时，将执行该名称树中的所有操作，为文档中的其他脚本定义JavaScript函数。

因此，如果您有兴趣了解注释中的JavaScript操作是否以（或包含）此字符串开头：if (this.hostContainer) { try {在这种情况下。

 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
 // what here?
 }

您可能需要先检查AnnotationAction.Get(PdfName.JS)是PdfString还是PdfStream，然后将其内容作为字符串获取，并使用通常的字符串比较方法检查它或任何调用它的函数（该函数可能在JavaScript名称树中定义）是否包含您搜索的字符串。

示例代码

我采用了您的代码，进行了清理（特别是将C#和Java混合在一起），并按照上述方式添加了代码，检查注释操作元素中的立即JavaScript代码：

Java版本

System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);

// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    System.out.printf("\nPage %d\n", i);
    // Get a page a PDF page
    PdfDictionary page = reader.getPageN(i);
    // Get all the annotations of page i
    PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

    // If page does not have annotations
    if (annotsArray == null)
    {
        System.out.printf("No annotations.\n", i);
        continue;
    }

    // For each annotation
    for (int j = 0; j < annotsArray.size(); ++j)
    {
        System.out.printf("Annotation %d - ", j);

        // For current annotation
        PdfDictionary curAnnot = annotsArray.getAsDict(j);

        // check if has JS as described below
        PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
        if (annotationAction == null)
        {
            System.out.print("no action");
        }
        // test if it is a JavaScript action
        else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
        {
            PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
            if (scriptObject == null)
            {
                System.out.print("missing JS entry");
                continue;
            }
            final String script;
            if (scriptObject.isString())
                script = ((PdfString)scriptObject).toUnicodeString();
            else if (scriptObject.isStream())
            {
                try (   ByteArrayOutputStream baos = new ByteArrayOutputStream()    )
                {
                    ((PdfStream)scriptObject).writeContent(baos);
                    script = baos.toString("ISO-8859-1");
                }
            }
            else
            {
                System.out.println("malformed JS entry");
                continue;
            }

            if (script.contains("if (this.hostContainer) { try {"))
                System.out.print("contains test string - ");

            System.out.printf("\n---\n%s\n---", script);
            // what here?
        }
        else
        {
            System.out.print("no JavaScript action");
        }
        System.out.println();
    }
}

(Test SearchActionJavaScript, method testSearchJsActionInFile)

C# 版本

using (PdfReader reader = new PdfReader(sourcePath))
{
    Console.WriteLine("file.pdf - Looking for special JavaScript actions.");

    // For each PDF page
    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        Console.Write("\nPage {0}\n", i);
        // Get a page a PDF page
        PdfDictionary page = reader.GetPageN(i);
        // Get all the annotations of page i
        PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);

        // If page does not have annotations
        if (annotsArray == null)
        {
            Console.WriteLine("No annotations.");
            continue;
        }

        // For each annotation
        for (int j = 0; j < annotsArray.Size; ++j)
        {
            Console.Write("Annotation {0} - ", j);

            // For current annotation
            PdfDictionary curAnnot = annotsArray.GetAsDict(j);

            // check if has JS as described below
            PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
            if (annotationAction == null)
            {
                Console.Write("no action");
            }
            // test if it is a JavaScript action
            else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
            {
                PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
                if (scriptObject == null)
                {
                    Console.WriteLine("missing JS entry");
                    continue;
                }
                String script;
                if (scriptObject.IsString())
                    script = ((PdfString)scriptObject).ToUnicodeString();
                else if (scriptObject.IsStream())
                {
                    using (MemoryStream stream = new MemoryStream())
                    {
                        ((PdfStream)scriptObject).WriteContent(stream);
                        script = stream.ToString();
                    }
                }
                else
                {
                    Console.WriteLine("malformed JS entry");
                    continue;
                }

                if (script.Contains("if (this.hostContainer) { try {"))
                    Console.Write("contains test string - ");

                Console.Write("\n---\n{0}\n---", script);
                // what here?
            }
            else
            {
                Console.Write("no JavaScript action");
            }
            Console.WriteLine();
        }
    }
}

输出

当运行任意版本来处理你的示例文件时，会得到以下结果：

file.pdf - Looking for special JavaScript actions.

Page 1
Annotation 0 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---

Page 2
No annotations.

Page 3
No annotations.