有没有一种简单的方法来统计一个 Word 文档(无论是 .doc 还是 .docx 格式)中的页面数量?
谢谢
你可以尝试使用Apache API来处理Word文档:
这是一种获取页面数量的方法:
public int getPageCount()
返回值: 如果SummaryInformation中不包含页数,则返回页面计数为0。
我发现了一个非常酷的类,可以使用Apache POI来计算Word、Excel和PowerPoint的页数。而且它适用于旧的doc和新的docx格式。
String lowerFilePath = filePath.toLowerCase();
if (lowerFilePath.endsWith(".xls")) {
HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(lowerFilePath));
Integer sheetNums = workbook.getNumberOfSheets();
if (sheetNums > 0) {
return workbook.getSheetAt(0).getRowBreaks().length + 1;
}
} else if (lowerFilePath.endsWith(".xlsx")) {
XSSFWorkbook xwb = new XSSFWorkbook(lowerFilePath);
Integer sheetNums = xwb.getNumberOfSheets();
if (sheetNums > 0) {
return xwb.getSheetAt(0).getRowBreaks().length + 1;
}
} else if (lowerFilePath.endsWith(".docx")) {
XWPFDocument docx = new XWPFDocument(POIXMLDocument.openPackage(lowerFilePath));
return docx.getProperties().getExtendedProperties().getUnderlyingProperties().getPages();
} else if (lowerFilePath.endsWith(".doc")) {
HWPFDocument wordDoc = new HWPFDocument(new FileInputStream(lowerFilePath));
return wordDoc.getSummaryInformation().getPageCount();
} else if (lowerFilePath.endsWith(".ppt")) {
HSLFSlideShow document = new HSLFSlideShow(new FileInputStream(lowerFilePath));
SlideShow slideShow = new SlideShow(document);
return slideShow.getSlides().length;
} else if (lowerFilePath.endsWith(".pptx")) {
XSLFSlideShow xdocument = new XSLFSlideShow(lowerFilePath);
XMLSlideShow xslideShow = new XMLSlideShow(xdocument);
return xslideShow.getSlides().length;
}
//Library is aspose
//package com.aspose.words.*
/*Open the Word Document */
Document doc = new Document("C:\\Temp\\file.doc");
/*Get page count */
int pageCount = doc.getPageCount();
docx4j可以通过以下方式获取总页数:
org.docx4j.openpackaging.parts.DocPropsExtendedPart docPropsExtendedPart = wordMLPkg.getDocPropsExtendedPart();
org.docx4j.docProps.extended.Properties extendedProps = (org.docx4j.docProps.extended.Properties)docPropsExtendedPart.getJaxbElement();
int numPages = extendedProps.getPages();