用Ghostscript分割PDF

Question

用Ghostscript分割PDF

45

我尝试使用Ghostscript拆分多页PDF，并在许多网站上甚至在ghostscript.com上找到了相同的解决方案，即：

gs -sDEVICE=pdfwrite -dSAFER -o outname.%d.pdf input.pdf

对我来说似乎不起作用，因为它只生成一个文件，包含所有页面，并且文件名为outname.1.pdf。

当我添加起始和结束页时，它可以正常工作，但是我希望在不知道这些参数的情况下工作。

在 gs-devel 存档中，我找到了一个解决方案：http://ghostscript.com/pipermail/gs-devel/2009-April/008310.html -- 但我想不使用 pdf_info 实现。

当我使用另一个设备，例如 pswrite，但是相同参数时，它可以正确地工作，产生与我的 input.pdf 相同数量的 ps 文件。

在使用 pdfwrite 时，这是正常的吗？我做错了什么吗？

- zseder

8个回答

15

您看到的是“正常”的行为：Ghostscript的当前版本pdfwrite输出设备不支持此功能。这在Use.htm中也有（诚然，有点含糊）记录：

"请注意，并非所有设备都支持每个文件一个页面的功能...."

我记得Ghostscript的开发人员之一在IRC上提到过他们可能会在将来的版本中添加pdfwrite这个功能，但似乎需要进行一些重大的代码重写，这就是为什么他们还没有这样做的原因...

更新：正如Gordon的评论所提示的那样，截至9.06版本（于2012年7月31日发布），Ghostscript现在也支持对pdfwrite使用问题中引用的命令行。（Gordon可能已经在9.05中发现了这个非官方支持，或者他从尚未标记为9.06的预发布源代码中编译了自己的可执行文件）。

- Kurt Pfeifle

是的，我看到了这行代码，但是我的短语“正常行为”想表达的是“pdfwrite是否是那些可能不支持此功能的人之一？”你对这个IRC的记忆对我来说很好，谢谢。 - zseder

4

对于在搜索中查找此答案的人：截至9.05版本，使用OP的命令对我而言每页一个文件可以正常工作。 - Gordon

1

@Gordon：在9.06版本中，对于“-o out_%d.pdf”语法（将多页PDF拆分为每页单独的文件）的支持已经正式成为官方功能。我在其他答案中已经提到过这一点（例如*将多页PDF文件拆分为单个页面*）。我忘记更新这个答案了。感谢您的提示。 - Kurt Pfeifle

4

在Windows命令提示符中（也适用于拖放操作），假设您已安装Ghostscript，这是一个脚本：

@echo off
chcp 65001
setlocal enabledelayedexpansion

rem Customize or remove this line if you already have Ghostscript folders in your system PATH
set path=C:\Program Files\gs\gs9.22\lib;C:\Program Files\gs\gs9.22\bin;%path%

:start

echo Splitting "%~n1%~x1" into standalone single pages...
cd %~d1%~p1
rem getting number of pages of PDF with GhostScript
for /f "usebackq delims=" %%a in (`gswin64c -q -dNODISPLAY -c "(%~n1%~x1) (r) file runpdfbegin pdfpagecount = quit"`) do set "numpages=%%a"

for /L %%n in (1,1,%numpages%) do (
echo Extracting page %%n of %numpages%...
set "x=00%%n"
set "x=!x:~-3!"
gswin64c.exe -dNumRenderingThreads=2 -dBATCH -dNOPAUSE -dQUIET -dFirstPage=%%n -dLastPage=%%n -sDEVICE=pdfwrite -sOutputFile="%~d1%~p1%~n1-!x!.pdf" %1
)

shift
if NOT x%1==x goto start

pause

将此脚本命名为类似于split PDF.bat的名称，并将其放在您的桌面上。将一个或多个多页PDF文件拖放到脚本上，它将为您的PDF的每一页创建一个独立的PDF文件，并将后缀 -001、-002等附加到名称中以区分页面。

如果您的系统PATH环境变量中已经有Ghostscript文件夹，则可能需要自定义（使用相关Ghostscript版本）或删除set path=... 行。

在Windows 10和Ghostscript 9.22下，它对我有效。请查看评论以查看它是否适用于新版本的Ghostscript。

- mmj

+1，使用GS获取页面计数，干得好！如果有人想在Linux/macOS上获取页面计数，请使用gs -q -dNODISPLAY -c "(../escaped\ file \name.pdf) (r) file runpdfbegin pdfpagecount = quit"。 - Gus Neves

1

非常有帮助。可以在GS 9.22上运行，但与9.50和9.52不兼容。有人知道如何解决吗？ - tstone-1

@user18258 我不知道如何修复这个问题，但我发现在Windows上使用另一个命令行工具来拆分PDF文件更方便，即sedja控制台。这里是一个拖放批处理：https://www.codepile.net/pile/6lWv3wzY - mmj

1

@mmj 感谢基于 sedja 的代码！我正在使用 GhostScript 处理许多“shell:sendto”任务，仍然对 9.52 兼容的解决方案感兴趣 - 尽管我理解您不会提供它。我在您上面的基于 GS 的代码中发现了一个小错误（我仍在使用 GS 版本 9.27！）：我认为 gswin64c.exe ... "%1" 应该是 gswin64c.exe ... %1，否则路径包含空格时会出问题。 - tstone-1

@tstone-1 看起来对于Ghostscript 9.50+，您需要添加-dNOSAFER选项（以及-dNODISPLAY）。请参阅：https://dev59.com/gJvga4cB1Zd3GeqP7c2g - mmj

4

 #!/bin/bash
#where $1 is the input filename

ournum=`gs -q -dNODISPLAY -c "("$1") (r) file runpdfbegin pdfpagecount = quit" 2>/dev/null`
echo "Processing $ournum pages"
counter=1
while [ $counter -le $ournum ] ; do
    newname=`echo $1 | sed -e s/\.pdf//g`
    reallynewname=$newname-$counter.pdf
    counterplus=$((counter+1))
    # make the individual pdf page
    yes | gs -dBATCH -sOutputFile="$reallynewname" -dFirstPage=$counter -dLastPage=$counter -sDEVICE=pdfwrite "$1" >& /dev/null
    counter=$counterplus
done

- John Ostrowick

2

这是一个简单的Python脚本，可以实现此功能：

Here's a simple python script which does it:

#!/usr/bin/python3

import os

number_of_pages = 68
input_pdf = "abstracts_rev09.pdf"

for i in range(1, number_of_pages +1):
    os.system("gs -q -dBATCH -dNOPAUSE -sOutputFile=page{page:04d}.pdf"
              " -dFirstPage={page} -dLastPage={page}"
              " -sDEVICE=pdfwrite {input_pdf}"
              .format(page=i, input_pdf=input_pdf))

- Adobe

0

Powershell版本。(批处理文件都是上世纪90年代的东西)

基于https://dev59.com/NWkv5IYBdhLWcg3w9lei#70438840


function expdf ($pdf, $pages, $out)
{
    $f = ((get-item $pdf).FullName.Replace('\', '/'))
    $o = "$out".Replace('\', '/')
    $count = gswin64c.exe -q -dNODISPLAY "--permit-file-read=$f" -c "($f) (r) file runpdfbegin pdfpagecount = quit"
    (1..$count) | foreach-object { gswin64c.exe -q -dBATCH -sDEVICE=pdfwrite "-sPageList=$_" -dNOPAUSE "-sOutputFile=tmp-$_.pdf" $f }
    $pages = $pages | foreach-object { $_ } #flatten
    $pdfs = get-childitem "tmp-*.pdf" | where-object { $_.BaseName.Replace('tmp-','') -in $pages } | select-object -expand name
    gswin64c.exe -dBATCH -sDEVICE=pdfwrite -dNOPAUSE "-sOutputFile=$o" $pdfs
    remove-item "tmp-*.pdf"
}

expdf -pdf './test.pdf' -pages (1..3),6 -out out.pdf

- Luiz Felipe

0

gs 只接受升序页面。为了从源中洗牌（即第7、8、5页），我在 ~/.bashrc 中编写了一个函数：

function expdf
{
local str=""
local arr=($(echo $1 | tr "," "\n"))
#          splitting
for i in "${arr[@]}";do
  gs -dBATCH -sDEVICE=pdfwrite -sPageList=$i -dNOPAUSE -sOutputFile=$i.tmp $2
#          reordering for combining
  str="$str $i.tmp"
done
#          combining to combine.pdf
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=combine.pdf -dBATCH $str
#          removing temporary files
for i in "${arr[@]}";do rm $i.tmp;done
}

使用示例：expdf 7-8,5 source.pdf

- oktobris

0

仅依赖于pdftk.exe的更新答案，而不需要调用Ghostscript

用户@mmj提供的答案曾经对我很有效，但由于GS版本在9.20和9.50之间某个地方停止工作了。我也知道@Adobe提供的解决方案。但是，我喜欢使用Windows（10）资源管理器完成重复任务，只需选择一个或多个文件并右键单击→发送到。这里有一个Python脚本（兼容3.8），它使用pdftk.exe（已测试2.02）来计算总页数并提取所有单个文件。它应该接受多个PDF作为输入。请确保在PATH中安装了Python和pdftk.exe。

将其命名为extract-pdf-pages-py.cmd并将其放置在shell:sendto中：

python %APPDATA%\Microsoft\Windows\SendTo\extract-pdf-pages-py.py %*

将以下内容放入同一文件夹中的extract-pdf-pages-py.py：

#!/usr/bin/python3
# put as extract-pdf-pages-py.py to shell:sendto

import os
import subprocess
import re
import sys
import mimetypes


def is_tool(name):
    from shutil import which
    return which(name) is not None


if not is_tool('pdftk'):
    input('pdftk.exe not within PATH. Aborting...')
    raise SystemExit("pdftk.exe not within PATH.")

sys.argv.pop(0)

for j in range(len(sys.argv)):
    input_pdf = sys.argv[j]

    if 'application/pdf' not in mimetypes.guess_type(input_pdf):
        input(f"File {input_pdf} is not a PDF. Skipping...")
        continue

    savefile = input_pdf.rstrip('.pdf')

    numpages = subprocess.Popen(f"pdftk \"{input_pdf}\" dump_data", shell=True, stdout=subprocess.PIPE)
    output1 = str(numpages.communicate()[0])
    output2 = re.search("NumberOfPages: ([0-9]*)", output1)
    number_of_pages = int(output2.group(1))

    for i in range(1, number_of_pages + 1):
        os.system(f"pdftk \"{input_pdf}\" cat {i} output \"{savefile}\"{i:04d}.pdf")

我使用了这个答案（由@Adobe编写的脚本）和那个（is_tool）中的代码。

- tstone-1

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Juanito Fatas · Accepted Answer

我发现Weimer先生编写的这个脚本非常有用：

#!/bin/sh
#
# pdfsplit [input.pdf] [first_page] [last_page] [output.pdf] 
#
# Example: pdfsplit big_file.pdf 10 20 pages_ten_to_twenty.pdf
#
# written by: Westley Weimer, Wed Mar 19 17:58:09 EDT 2008
#
# The trick: ghostscript (gs) will do PDF splitting for you, it's just not
# obvious and the required defines are not listed in the manual page. 

if [ $# -lt 4 ] 
then
        echo "Usage: pdfsplit input.pdf first_page last_page output.pdf"
        exit 1
fi
gs -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$4" -dFirstPage=$2 -dLastPage=$3 -sDEVICE=pdfwrite "$1"

来源： http://www.cs.virginia.edu/~weimer/pdfsplit/pdfsplit

将其保存为pdfsplit.sh，就能看到魔法发生了。

PDFSAM也可以完成这项任务。该软件可在Windows和Mac上使用。