有人能给我提供一个使用Apache PDFBox将PDF文件转换为不同图像(每个PDF页面为一个图像)的示例吗?
有人能给我提供一个使用Apache PDFBox将PDF文件转换为不同图像(每个PDF页面为一个图像)的示例吗?
1.8.*版本的解决方案:
PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page = 0;
for (PDPage pdPage : pdPages)
{
++page;
BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + page + ".png", 300);
}
document.close();
在进行构建之前,不要忘记阅读1.8依赖项页面。
2.0版本的解决方案:
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
// suffix in filename will be used as the file format
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (page+1) + ".png", 300);
}
document.close();
ImageIOUtil类在单独的下载/工件(pdf-tools)中。在构建之前阅读2.0依赖项页面,您需要额外的jar文件来处理包含jbig2图像的PDF、用于保存为tiff图像以及读取加密文件。
确保使用您正在使用的JDK版本的最新版本,例如,如果您使用jdk8,则不要使用1.8.0_5版本,而是使用1.8.0_191或您在阅读时的最新版本。早期版本非常缓慢。
BufferedImage bim =
。我今天尝试了PdfBox 2.0.15。
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.rendering.*;
import java.awt.image.*;
import java.io.*;
import javax.imageio.*;
public static void PDFtoJPG (String in, String out) throws Exception
{
PDDocument pd = PDDocument.load (new File (in));
PDFRenderer pr = new PDFRenderer (pd);
BufferedImage bi = pr.renderImageWithDPI (0, 300);
ImageIO.write (bi, "JPEG", new File (out));
}
public class PDFtoJPGConverter {
public List<File> convertPdfToImage(File file, String destination) throws Exception {
File destinationFile = new File(destination);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("DESTINATION FOLDER CREATED -> " + destinationFile.getAbsolutePath());
}else if(destinationFile.exists()){
System.out.println("DESTINATION FOLDER ALLREADY CREATED!!!");
}else{
System.out.println("DESTINATION FOLDER NOT CREATED!!!");
}
if (file.exists()) {
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
List<File> fileList = new ArrayList<File>();
String fileName = file.getName().replace(".pdf", "");
System.out.println("CONVERTER START.....");
for (int i = 0; i < doc.getNumberOfPages(); i++) {
// default image files path: original file path
// if necessary, file.getParent() + "/" => another path
File fileTemp = new File(destination + fileName + "_" + i + ".jpg"); // jpg or png
BufferedImage image = renderer.renderImageWithDPI(i, 200);
// 200 is sample dots per inch.
// if necessary, change 200 into another integer.
ImageIO.write(image, "JPEG", fileTemp); // JPEG or PNG
fileList.add(fileTemp);
}
doc.close();
System.out.println("CONVERTER STOPTED.....");
System.out.println("IMAGE SAVED AT -> " + destinationFile.getAbsolutePath());
return fileList;
} else {
System.err.println(file.getName() + " FILE DOES NOT EXIST");
}
return null;
}
public static void main(String[] args) {
try {
PDFtoJPGConverter converter = new PDFtoJPGConverter();
Scanner sc = new Scanner(System.in);
System.out.print("Enter your destination folder where save image \n");
// Destination = D:/PPL/;
String destination = sc.nextLine();
System.out.print("Enter your selected pdf files name with source folder \n");
String sourcePathWithFileName = sc.nextLine();
// Source Path = D:/PDF/ant.pdf,D:/PDF/abc.pdf,D:/PDF/xyz.pdf
if (sourcePathWithFileName != null || sourcePathWithFileName != "") {
String[] files = sourcePathWithFileName.split(",");
for (String file : files) {
File pdf = new File(file);
System.out.print("FILE:>> "+ pdf);
converter.convertPdfToImage(pdf, destination);
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
====================================
我在这里使用的是Apache pdfbox-2.0.8、commons-logging-1.2和fontbox-2.0.8库。
祝编程愉快 :)
如果不需要任何额外的依赖项,您可以直接使用已包含在PDFBox
中的PDFToImage
类。
Kotlin代码:
PDFToImage.main(arrayOf<String>("-outputPrefix", "newImgFilenamePrefix", existingPdfFilename))
其他配置选项请参考:https://pdfbox.apache.org/docs/2.0.8/javadocs/org/apache/pdfbox/tools/PDFToImage.html
为了新的Apache pdfbox版本3(3.0.0-RC1),只需添加以下代码片段:
try(PDDocument pddDoc = Loader.loadPDF(docFile) ){
PDFRenderer pr = new PDFRenderer (pddDoc );
BufferedImage backImage = pr.renderImage(0);
} catch (IOException e) {
e.printStackTrace();
}
注释
PDDocument.load
等已被新的org.apache.pdfbox.Loader
类替换import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.nio.file.Path;
public class Pdf2Image {
public String convertPdf2Img(String fileInput, Path path) {
String destDir = "";
try {
String destinationDir = path.toString();
File sourceFile = new File(fileInput);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> " + destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
String fileName = sourceFile.getName().replace(".pdf", "");
// int pageNumber = 0;
// for (PDPage page : document.getPages()) {
for (int pageNumber = 0; pageNumber < document.getNumberOfPages(); ++pageNumber) {
BufferedImage bim = pdfRenderer.renderImage(pageNumber);
destDir = destinationDir + File.separator + fileName + "_" + pageNumber + ".png";
ImageIO.write(bim, "png", new File(destDir));
}
document.close();
System.out.println("Image saved at -> " + destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() + " File does not exist");
}
} catch (Exception e) {
e.printStackTrace();
}
return destDir;
}
}
private static String generatePdfThumbnail(byte[] imageInBytesArray) throws IOException {
PDDocument document = PDDocument.load(imageInBytesArray);
PDFRenderer renderer = new PDFRenderer(document);
BufferedImage bufferedImage = renderer.renderImage(0);
Graphics2D bufImageGraphics = bufferedImage.createGraphics();
bufImageGraphics.drawImage(bufferedImage, 0, 0, null);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
boolean foundWriter = ImageIO.write(bufferedImage, "jpg", baos);
byte[] fileContent = null;
if (!foundWriter) {
return "";
}
fileContent = baos.toByteArray();
return Base64.getEncoder().encodeToString(fileContent);
}