如何生成适用于文档查看器的纯文本源代码PDF示例？

Question

如何生成适用于文档查看器的纯文本源代码PDF示例？

linuxpdfcommand-linepdf-generation

5

我刚发现了这篇文章：Adobe论坛：规范中的简单文本字符串示例已损坏。，因此我有兴趣找到纯文本源代码PDF示例。

通过那篇文章，我最终找到了：

网页PDF参考和Adobe扩展PDF规范 | Adobe开发人员社区；其中包含：
- PDF文档管理 - 便携式文档格式 - 第1部分：PDF 1.7，第1版（PDF32000_2008.pdf）

PDF 1.7规范在第699页的附录“附录H（信息性）示例PDF文件”中; 从那里开始，我想尝试“H.3 简单文本字符串示例”（经典的“Hello World”）。

所以我试图将其保存为hello.pdf（除非你从PDF32000_2008.pdf复制时会得到“%PDF-1. 4” - 也就是说，在1.后面插入了一个空格，必须将其删除）:

%PDF-1.4
1 0 obj
  << /Type /Catalog
      /Outlines 2 0 R
      /Pages 3 0 R
  >>
endobj

2 0 obj
  << /Type /Outlines
      /Count 0
  >>
endobj

3 0 obj
  << /Type /Pages
      /Kids [ 4 0 R ]
      /Count 1
  >>
endobj

4 0 obj
  << /Type /Page
      /Parent 3 0 R
      /MediaBox [ 0 0 612 792 ]
      /Contents 5 0 R
      /Resources << /ProcSet 6 0 R
      /Font << /F1 7 0 R >>
  >>
>>
endobj

5 0 obj
  << /Length 73 >>
stream
  BT
    /F1 24 Tf
    100 100 Td
    ( Hello World ) Tj
  ET
endstream
endobj

"...我正在尝试打开它："

evince hello.pdf

然而，evince无法打开它：“无法打开文档/PDF文档已损坏”; 同时：

Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table

我还使用 qpdf 进行检查：

$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf: can't find startxref
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
hello.pdf: unable to find trailer dictionary while recovering damaged file

我在哪里做错了？

提前感谢任何回答，谢谢！

- sdaau

2个回答

2

啊，该死——我只复制了代码的一部分；原始帖子的代码在第701页上——然后有一个页脚让我感到困惑；否则代码会继续在第702页上。

（编辑：还请参阅PDF入门-GNUpdf（档案）以获取类似的更详细的示例）

所以这是完整的代码：

%PDF-1.4
1 0 obj
  << /Type /Catalog
      /Outlines 2 0 R
      /Pages 3 0 R
  >>
endobj

2 0 obj
  << /Type /Outlines
      /Count 0
  >>
endobj

3 0 obj
  << /Type /Pages
      /Kids [ 4 0 R ]
      /Count 1
  >>
endobj

4 0 obj
  << /Type /Page
      /Parent 3 0 R
      /MediaBox [ 0 0 612 792 ]
      /Contents 5 0 R
      /Resources << /ProcSet 6 0 R
      /Font << /F1 7 0 R >>
  >>
>>
endobj

5 0 obj
  << /Length 73 >>
stream
  BT
    /F1 24 Tf
    100 100 Td
    ( Hello World ) Tj
  ET
endstream
endobj

6 0 obj
  [ /PDF /Text ]
endobj

7 0 obj
  << /Type /Font
    /Subtype /Type1
    /Name /F1
    /BaseFont /Helvetica
    /Encoding /MacRomanEncoding
  >>
endobj

xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n

trailer
  << /Size 8
    /Root 1 0 R
  >>
startxref
625
%%EOF

实际上，正如错误消息所说，交叉引用部分确实丢失了！

但是，这还不是结束 - 尽管此文档将在evince中打开，但evince仍然会抱怨：

$ evince hello.pdf 
Error: PDF file is damaged - attempting to reconstruct xref table...

"...而且qpdf也会： "

$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf (file position 625): xref not found
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
checking hello.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: hello.pdf (object 5 0, file position 436): attempting to recover stream length

为了得到一个正确的例子，正如Adobe论坛：规范中的简单文本字符串示例已损坏。所指出的那样，需要重建交叉引用表（具有正确的字节偏移量）。

为了做到这一点，我们可以使用pdftk来 "修复PDF文件的损坏XREF表和流长度（如果可能）":

$ pdftk hello.pdf output hello_repair.pdf

...现在hello_repair.pdf可以在evince中正常打开，而且qpdf报告：

$ qpdf --check hello_repair.pdf
checking hello_repair.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found

希望这能帮助到某人，祝愿！

- sdaau

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kurt Pfeifle · Accepted Answer

你需要在文件末尾附加一个（符合语法的）xref和trailer部分。也就是说：即使字节偏移量不正确，你的PDF中的每个对象都需要在xref表中有一行。然后Ghostscript、pdftk或qpdf可以重新建立正确的xref并渲染文件：

[...]
endobj
xref 
0 8 
0000000000 65535 f 
0000000010 00000 n 
0000000020 00000 n 
0000000030 00000 n 
0000000040 00000 n 
0000000050 00000 n 
0000000060 00000 n 
0000000070 00000 n 
trailer 
<</Size 8/Root 1 0 R>> 
startxref 
555 
%%EOF