获取Word文档的页面

9

我正在尝试通过Microsoft.Office.Interop.Word(我在VS2012中使用C#)获取MSWord文档的所有页面。我想要得到的是一个List< String > Pages,其中索引是页码。我理解(至少我认为是这样),没有直接的方法可以做到这一点。因此,我想到了以下解决方案:

        List<String> Pages = new List<String>();
        int NumberOfPreviousPage = -1;
        int NumberOfPage = -1;
        string InnerText = "";
        for (int i = 0; i < Doc.Paragraphs.Count; i++)
        {
            Paragraph CurrentParagraph = Doc.Paragraphs[i + 1];
            InnerText = CurrentParagraph.Range.Text;
            NumberOfPage = CurrentParagraph.Range.get_Information(WdInformation.wdActiveEndPageNumber);
            if (NumberOfPage == NumberOfPreviousPage)
                Pages[Pages.Count - 1] += String.Format("\r\n{0}", InnerText);
            else
            {
                Pages.Add(InnerText);
                NumberOfPreviousPage = NumberOfPage;
            }
        }

但是,当算法遇到一个从一页开始并在另一页结束的段落时,它会决定该段落应该在下一页。我想将这个段落分成两页,但是我不知道如何检测需要拆分的位置。


请参见此处https://dev59.com/aWjWa4cB1Zd3GeqPoj2y#12339771。 - Matthew Lock
2个回答

10

最终,我完成了这个项目,虽然它很糟糕、很丑陋,但它能够实现它应该做的事情:

public string[] GetPagesDoc(object Path)
    {
        List<string> Pages = new List<string>();

        // Get application object
        Microsoft.Office.Interop.Word.Application WordApplication = new Microsoft.Office.Interop.Word.Application();

        // Get document object
        object Miss = System.Reflection.Missing.Value;
        object ReadOnly = false;
        object Visible = false;
        Document Doc = WordApplication.Documents.Open(ref Path, ref Miss, ref ReadOnly, ref Miss, ref Miss, ref Miss, ref Miss, ref Miss, ref Miss, ref Miss, ref Miss, ref Visible, ref Miss, ref Miss, ref Miss, ref Miss);

        // Get pages count
        Microsoft.Office.Interop.Word.WdStatistic PagesCountStat = Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages;
        int PagesCount = Doc.ComputeStatistics(PagesCountStat, ref Miss);

        //Get pages
        object What = Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage;
        object Which = Microsoft.Office.Interop.Word.WdGoToDirection.wdGoToAbsolute;
        object Start;
        object End;
        object CurrentPageNumber;
        object NextPageNumber;

        for (int Index = 1; Index < PagesCount + 1; Index++)
        {
            CurrentPageNumber = (Convert.ToInt32(Index.ToString()));
            NextPageNumber = (Convert.ToInt32((Index+1).ToString()));

            // Get start position of current page
            Start = WordApplication.Selection.GoTo(ref What, ref Which, ref CurrentPageNumber, ref Miss).Start;

            // Get end position of current page                                
            End = WordApplication.Selection.GoTo(ref What, ref Which, ref NextPageNumber, ref Miss).End;

            // Get text
            if (Convert.ToInt32(Start.ToString()) != Convert.ToInt32(End.ToString()))
                Pages.Add(Doc.Range(ref Start, ref End).Text);
            else
                Pages.Add(Doc.Range(ref Start).Text);
        }
            return Pages.ToArray<string>();
    }

0
一个更简单的解决方案。
伪代码:
  • 获取页面总数。
  • 对于每一页:
    • 查找此页最后一个字符索引和上一页最后一个字符索引之间的字符。
实现:
    /// <summary>
    /// Reads each page of the word document into a string and returns the list of the page strings.
    /// </summary>
    public static IEnumerable<string> ReadPages(string filePath)
    {
        ICollection<string> pageStrings = new List<string>();
        Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
        Document doc = app.Documents.Open(filePath);

        long pageCount = doc.ComputeStatistics(Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages);
        int lastPageEnd = 0; // The document starts at 0.
        for ( long i = 0; i < pageCount; i++)
        {
            // The "range" of the page break. This actually is a range of 0 elements, both start and end are the 
            // location of the page break.
            Range pageBreakRange = app.Selection.GoToNext(Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage);
            string currentPageText = doc.Range(lastPageEnd, pageBreakRange.End).Text;
            lastPageEnd = pageBreakRange.End;
            pageStrings.Add(currentPageText);
        }
        return pageStrings;
    }

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接