将PDF转换为图像批处理

3
我正在研究一种将PDF文件转换为图像的解决方案。我使用了来自codeproject的以下示例:http://www.codeproject.com/Articles/317700/Convert-a-PDF-into-a-series-of-images-using-Csharp?msg=4134859#xx4134859xx
现在,我尝试使用以下代码从1000多个PDF文件生成新的图像:
using Cyotek.GhostScript;
using Cyotek.GhostScript.PdfConversion;
using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace RefClass_PDF2Image
{
    class Program
    {
        static void Main(string[] args)
        {
            string outputPath = Properties.Settings.Default.outputPath;
            string pdfPath = Properties.Settings.Default.pdfPath;

            if (!Directory.Exists(outputPath))
            {
                Console.WriteLine("Der angegebene Pfad " + outputPath + " für den Export wurde nicht gefunden. Bitte ändern Sie den Pfad (outputPath) in der App.Config Datei.");
                return;
            }
            else
            {
                Console.WriteLine("Output Pfad: " + outputPath + " gefunden.");
            }

            if (!Directory.Exists(pdfPath))
            {
                Console.WriteLine("Der angegebene Pfad " + pdfPath + " zu den PDF Zeichnungen wurde nicht gefunden. Bitte ändern Sie den Pfad (pdfPath) in der App.Config Datei.");
                return;
            }
            else
            {
                Console.WriteLine("PDF Pfad: " + pdfPath + " gefunden.");
            }


            Pdf2ImageSettings settings = GetPDFSettings();

            DateTime start = DateTime.Now;
            TimeSpan span;

            Console.WriteLine("");
            Console.WriteLine("Extraktion der PDF Zeichnungen wird gestartet: " + start.ToShortTimeString());
            Console.WriteLine("");

            DirectoryInfo diretoryInfo = new DirectoryInfo(pdfPath);
            DirectoryInfo[] directories = diretoryInfo.GetDirectories();

            Console.WriteLine("");
            Console.WriteLine("Es wurden " + directories.Length + " verschiedende Verzeichnisse gefunden.");
            Console.WriteLine("");

            List<string> filenamesPDF = Directory.GetFiles(pdfPath, "*.pdf*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList();
            List<string> filenamesOutput = Directory.GetFiles(outputPath, "*.*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList();

            Console.WriteLine("");
            Console.WriteLine("Es wurden " + filenamesPDF.Count + " verschiedende PDF Zeichnungen gefunden.");
            Console.WriteLine("");

            List<string> newFileNames = new List<string>();
            int cutLength = pdfPath.Length;


            for (int i = 0; i < filenamesPDF.Count; i++)
            {
                string temp = filenamesPDF[i].Remove(0, cutLength);
                temp = outputPath + temp;
                temp = temp.Replace("pdf", "jpg");
                newFileNames.Add(temp);
            }

            for (int i = 0; i < filenamesPDF.Count; i++)
            {
                FileInfo fi = new FileInfo(newFileNames[i]);
                if (!fi.Exists)
                {
                    if (!Directory.Exists(fi.DirectoryName))
                    {
                        Directory.CreateDirectory(fi.DirectoryName);
                    }

                    Bitmap firstPage = new Pdf2Image(filenamesPDF[i], settings).GetImage();
                    firstPage.Save(newFileNames[i], System.Drawing.Imaging.ImageFormat.Jpeg);
                    firstPage.Dispose();
                }

                //if (i % 20 == 0)
                //{
                //  GC.Collect();
                //  GC.WaitForPendingFinalizers();
                //}
            }


            Console.ReadLine();
        }

        private static Pdf2ImageSettings GetPDFSettings()
        {
            Pdf2ImageSettings settings;
            settings = new Pdf2ImageSettings();
            settings.AntiAliasMode = AntiAliasMode.Medium;
            settings.Dpi = 150;
            settings.GridFitMode = GridFitMode.Topological;
            settings.ImageFormat = ImageFormat.Png24;
            settings.TrimMode = PdfTrimMode.CropBox;
            return settings;
        }
    }
}

很不幸,我总是在Pdf2Image.cs中遇到内存不足的异常。以下是代码:

public Bitmap GetImage(int pageNumber)
{
  Bitmap result;
  string workFile;

  //if (pageNumber < 1 || pageNumber > this.PageCount)
  //    throw new ArgumentException("Page number is out of bounds", "pageNumber");

  if (pageNumber < 1)
      throw new ArgumentException("Page number is out of bounds", "pageNumber");

  workFile = Path.GetTempFileName();

  try
  {
    this.ConvertPdfPageToImage(workFile, pageNumber);
    using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
    {
        result = new Bitmap(stream); // --->>> here is the out of memory exception
        stream.Close();
        stream.Dispose();
    }

  }
  finally
  {
    File.Delete(workFile);
  }

  return result;
}

我该如何修复以避免这个异常?

非常感谢任何帮助, tro


是的,这就是我所做的:firstPage.Dispose(); - tro
3个回答

3

不知道这对您是否有价值,但似乎您可以在不使用位图的情况下完成您想要的操作。PdfToImage中有以下代码:

public void ConvertPdfPageToImage(string outputFileName, int pageNumber)
{
  if (pageNumber < 1 || pageNumber > this.PageCount)
    throw new ArgumentException("Page number is out of bounds", "pageNumber");

  using (GhostScriptAPI api = new GhostScriptAPI())
    api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings));
}

这个方法可以在你想要的位置为你编写一个文件。为什么不直接调用该方法,而是读取图像后再将其写回?


太好了!那正是我在寻找的!点赞! - tro

2
添加 using 语句以使用生成的位图。
using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
using (Bitmap result = new Bitmap(stream))
{
...
}

这个解决方案看起来比仅仅使用dispose更加优雅。 - Malhotra
使用“using”将封闭块包装在try/finally中,在finally块中调用“Dispose”。这确保即使发生异常,也将调用“Dispose”。[链接](https://dev59.com/BWgu5IYBdhLWcg3w6bOo) - Adam

2

这可能不是直接回答你的问题,但仍然可能有用:Imagemagick提供了一种简单的批量模式从PDF创建图像的方法。

将单个PDF文件转换为多个图像:

convert -geometry 1024x768 -density 200 -colorspace RGB test.pdf +adjoin test_%0d.jpg

或者,如果您想处理多个PDF文件:

mogrify -format jpg -alpha off -density 150 -quality 80 -resize 768 -unsharp 1.5 *.pdf

(显然,设置应根据您的需求进行调整 :))
要在C#中以编程方式执行此操作,您可以使用.NET ImageMagick包装器http://imagemagick.codeplex.com

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接