使用iTextSharp将PDF文件页面转换为图像

23
我想使用ItextSharp库将Pdf页面转换为图像。
有没有想法如何将每个页面转换为图像文件?
5个回答

11

iText/iTextSharp可以生成和/或修改现有的PDF文件,但是它们不执行任何呈现操作,这正是您所需要的。 我建议查看Ghostscript或其他知道如何实际呈现PDF的库。


6

您可以使用ImageMagick将PDF转换为图像。

convert -density 300 "d:\1.pdf" -scale @1500000 "d:\a.jpg"

而拆分PDF可以使用iTextSharp。

以下是他人提供的代码。

void SplitePDF(string filepath)
    {
        iTextSharp.text.pdf.PdfReader reader = null;
        int currentPage = 1;
        int pageCount = 0;
        //string filepath_New = filepath + "\\PDFDestination\\";

        System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
        //byte[] arrayofPassword = encoding.GetBytes(ExistingFilePassword);
        reader = new iTextSharp.text.pdf.PdfReader(filepath);
        reader.RemoveUnusedObjects();
        pageCount = reader.NumberOfPages;
        string ext = System.IO.Path.GetExtension(filepath);
        for (int i = 1; i <= pageCount; i++)
        {
            iTextSharp.text.pdf.PdfReader reader1 = new iTextSharp.text.pdf.PdfReader(filepath);
            string outfile = filepath.Replace((System.IO.Path.GetFileName(filepath)), (System.IO.Path.GetFileName(filepath).Replace(".pdf", "") + "_" + i.ToString()) + ext);
            reader1.RemoveUnusedObjects();
            iTextSharp.text.Document doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));
            iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
            doc.Open();
            for (int j = 1; j <= 1; j++)
            {
                iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader1, currentPage);
                pdfCpy.SetFullCompression();
                pdfCpy.AddPage(page);
                currentPage += 1;
            }
            doc.Close();
            pdfCpy.Close();
            reader1.Close();
            reader.Close();

        }
    }

2
ImageMagick本身无法处理PostScript和PDF文件。为此,它使用一个名为Ghostscript的第三方软件作为“代理”。 - Amer Sawan

5
你可以使用 Ghostscript 将PDF文件转换为图片,我使用以下参数将需要的PDF转换为带有多个帧的tiff图像:
gswin32c.exe   -sDEVICE=tiff12nc -dBATCH -r200 -dNOPAUSE  -sOutputFile=[Output].tiff [PDF FileName]

您可以使用 -q 参数进行静默模式。您可以从这里获取有关其输出设备的更多信息。

之后,您可以像以下方式轻松加载tiff帧

using (FileStream stream = new FileStream(@"C:\tEMP\image_$i.tiff", FileMode.Open, FileAccess.Read, FileShare.Read))
{
    BitmapDecoder dec = BitmapDecoder.Create(stream, BitmapCreateOptions.IgnoreImageCache, BitmapCacheOption.None);
    BitmapEncoder enc = BitmapEncoder.Create(dec.CodecInfo.ContainerFormat);
    enc.Frames.Add(dec.Frames[frameIndex]);
}

这是AGPL许可证 :( - chintan310

0

我使用了MuPDFCore NuGet来完成。这是我使用的指南链接: https://giorgiobianchini.com/MuPDFCore/MuPDFCore.pdf

using System;
using System.Threading.Tasks;
using MuPDFCore;
using VectSharp.Raster;   
      MuPDFContext context = new MuPDFContext();
                    MuPDFDocument document = new MuPDFDocument(context, @"C:\install\test.pdf");
        
             
                    //Renderers: one per page
                    MuPDFMultiThreadedPageRenderer[] renderers = new MuPDFMultiThreadedPageRenderer[document.Pages.Count];
                    //Page size: one per page
                    RoundedSize[] renderedPageSizes = new RoundedSize[document.Pages.Count];
                    //Boundaries of the tiles that make up each page: one array per page, with one element per thread
                    RoundedRectangle[][] tileBounds = new RoundedRectangle[document.Pages.Count][];
                    //Addresses of the memory areas where the image data of the tiles will be stored: one array per page, with  one element per thread
                    IntPtr[][] destinations = new IntPtr[document.Pages.Count][];
                    //Cycle through the pages in the document to initialise everything
                    for (int i = 0; i < document.Pages.Count; i++)
                    {
                        //Initialise the renderer for the current page, using two threads (total number of threads: number of pages x 2
                    renderers[i] = document.GetMultiThreadedRenderer(i, 2);
                        //Determine the boundaries of the page when it is rendered with a 1.5x zoom factor
                        RoundedRectangle roundedBounds = document.Pages[i].Bounds.Round(2);//quality ..can use 0.5 ,1 etc.
                        renderedPageSizes[i] = new RoundedSize(roundedBounds.Width, roundedBounds.Height);
                        //Determine the boundaries of each tile by splitting the total size of the page by the number of  threads.
                        tileBounds[i] = renderedPageSizes[i].Split(renderers[i].ThreadCount);
                        destinations[i] = new IntPtr[renderers[i].ThreadCount];
                        for (int j = 0; j < renderers[i].ThreadCount; j++)
                        {
                            //Allocate the required memory for the j-th tile of the i-th page.
                            //Since we will be rendering with a 24-bit-per-pixel format, the required memory in bytes is height   x width x 3.
                            destinations[i][j] = System.Runtime.InteropServices.Marshal.AllocHGlobal(tileBounds[i][j].Height * tileBounds[i][j].Width * 3);
                        }
                    }
                    //Start the actual rendering operations in parallel.
                    Parallel.For(0, document.Pages.Count, i =>
                    {
                        renderers[i].Render(renderedPageSizes[i], document.Pages[i].Bounds, destinations[i], PixelFormats.RGB);
                    });
                    //The code in this for-loop is not really part of MuPDFCore - it just shows an example of using VectSharp to  "stitch" the tiles up and produce the full image.
        for (int i = 0; i < document.Pages.Count; i++)
                    {
                        //Create a new (empty) image to hold the whole page.
                        VectSharp.Page renderedPage = new VectSharp.Page(renderedPageSizes[i].Width,
                        renderedPageSizes[i].Height);
                        //Draw each tile onto the image.
                        for (int j = 0; j < renderers[i].ThreadCount; j++)
                        {
                            //Create a raster image object containing the pixel data. Yay, we do not need to copy/marshal anything!
                        VectSharp.RasterImage tile = new VectSharp.RasterImage(destinations[i][j], tileBounds[i][j].Width,
                        tileBounds[i][j].Height, false, false);
                            //Draw the tile on the main image page.
                            renderedPage.Graphics.DrawRasterImage(tileBounds[i][j].X0, tileBounds[i][j].Y0, tile);
                            
                        }
                        //Save the full page as a PNG image.
                        renderedPage.SaveAsPNG(@"C:\install\page"+ i.ToString() + ".png");
                    }
                    //Clean-up code.
                    for (int i = 0; i < document.Pages.Count; i++)
                    {
                        //Release the allocated memory.
                        for (int j = 0; j < renderers[i].ThreadCount; j++)
                        {
                            System.Runtime.InteropServices.Marshal.FreeHGlobal(destinations[i][j]);
                        }
                        //Release the renderer (if you skip this, the quiescent renderer’s threads will not be stopped, and your  application will never exit!
                        renderers[i].Dispose();
                    }
                        document.Dispose();
                        context.Dispose();
                    }

-4

你可以从PDF中提取图像并保存为JPG格式,这里是示例代码,你需要使用Itext Sharp库。

 public IEnumerable<System.Drawing.Image> ExtractImagesFromPDF(string sourcePdf)
    {
        // NOTE:  This will only get the first image it finds per page.
        var pdf = new PdfReader(sourcePdf);
        var raf = new RandomAccessFileOrArray(sourcePdf);

        try
        {
            for (int pageNum = 1; pageNum <= pdf.NumberOfPages; pageNum++)
            {
                PdfDictionary pg = pdf.GetPageN(pageNum);

                // recursively search pages, forms and groups for images.
                PdfObject obj = ExtractImagesFromPDF_FindImageInPDFDictionary(pg);
                if (obj != null)
                {
                    int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(CultureInfo.InvariantCulture));
                    PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
                    PdfStream pdfStrem = (PdfStream)pdfObj;
                    PdfImageObject pdfImage = new PdfImageObject((PRStream)pdfStrem);
                    System.Drawing.Image img = pdfImage.GetDrawingImage();
                    yield return img;
                }
            }
        }
        finally
        {
            pdf.Close();
            raf.Close();
        }
    }

1
什么是ExtractImagesFromPDF_FindImageInPDFDictionary? - JDPeckham
@JDPeckham 我认为下面链接中的两个自定义函数非常有用:https://carra-lucia-ltd.co.uk/2014/12/09/save-images-from-pdf-files-using-itextsharp/ - Stephan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接