从Word文档中获取标题

27

如何使用VBA获取Word文档中所有标题的列表?

7个回答

22

你是指像这样的createOutline函数(实际上是将源Word文档中的所有标题复制到一个新的Word文档中):

(我相信astrHeadings = docSource.GetCrossReferenceItems(wdRefTypeHeading)函数是该程序的关键,应该允许您检索所需内容)

Public Sub CreateOutline()
    Dim docOutline As Word.Document
    Dim docSource As Word.Document
    Dim rng As Word.Range
    
    Dim astrHeadings As Variant
    Dim strText As String
    Dim intLevel As Integer
    Dim intItem As Integer
        
    Set docSource = ActiveDocument
    Set docOutline = Documents.Add
    
    ' Content returns only the main body of the document, not the headers/footer.        
    Set rng = docOutline.Content
    ' GetCrossReferenceItems(wdRefTypeHeading) returns an array with references to all headings in the document
    astrHeadings = docSource.GetCrossReferenceItems(wdRefTypeHeading)
    
    For intItem = LBound(astrHeadings) To UBound(astrHeadings)
        ' Get the text and the level.
        strText = Trim$(astrHeadings(intItem))
        intLevel = GetLevel(CStr(astrHeadings(intItem)))
        
        ' Add the text to the document.
        rng.InsertAfter strText & vbNewLine
        
        ' Set the style of the selected range and
        ' then collapse the range for the next entry.
        rng.Style = "Heading " & intLevel
        rng.Collapse wdCollapseEnd
    Next intItem
End Sub

Private Function GetLevel(strItem As String) As Integer
    ' Return the heading level of a header from the
    ' array returned by Word.
    
    ' The number of leading spaces indicates the
    ' outline level (2 spaces per level: H1 has
    ' 0 spaces, H2 has 2 spaces, H3 has 4 spaces.
        
    Dim strTemp As String
    Dim strOriginal As String
    Dim intDiff As Integer
    
    ' Get rid of all trailing spaces.
    strOriginal = RTrim$(strItem)
    
    ' Trim leading spaces, and then compare with
    ' the original.
    strTemp = LTrim$(strOriginal)
    
    ' Subtract to find the number of
    ' leading spaces in the original string.
    intDiff = Len(strOriginal) - Len(strTemp)
    GetLevel = (intDiff / 2) + 1
End Function

2018年3月6日由@kol更新

尽管astrHeadings是一个数组(IsArray返回True,并且TypeName返回String()),但当我在VBScript中尝试访问它的元素时,会出现类型不匹配错误(基于Windows 10 Pro 1709 16299.248上的v5.8.16384)。这必须是一个VBScript特有的问题,因为如果我在Word的VBA编辑器中运行相同的代码,我可以访问元素。最终我通过迭代TOC的行来解决了这个问题,因为即使从VBScript中也可以工作:

For Each Paragraph In Doc.TablesOfContents(1).Range.Paragraphs
  WScript.Echo Paragraph.Range.Text
Next

除了将“int”更改为“long”以增加宏速度外,没有其他更改。 - Reinstate Monica - Goodbye SE
按照@Wiki的建议,我将函数中所有的“int”都替换为“long”,但是它报了一个“下标超出范围”的错误9。有些int可以被替换,但不是全部。请参见我发布的答案以了解哪些可替换。(在Word Pro 2013中) - MagTun
注意使用这种方法时可能会出现截断的标题 (GetCrossReferenceItems)。参考链接:http://windowssecrets.com/forums/showthread.php/158870-Word-2007-VBA-GetCrossReferenceItems(wdRefTypeHeading)-returns-truncated-variant-array - Fuhrmanator
尽管astrHeadings是一个数组(IsArray返回TrueTypeName返回String()),但当我尝试获取其元素时(VBScript 5.8.16384在Windows 10 Pro 1709 16299.248上),我会收到“类型不匹配”的错误。 - kol
@kol 九年后,这是可能的。当时我没有在Windows 10上测试它 ;) - VonC
显示剩余7条评论

18

获取标题列表最简单的方式是遍历文档中的段落,例如:

 Sub ReadPara()

    Dim DocPara As Paragraph

    For Each DocPara In ActiveDocument.Paragraphs

     If Left(DocPara.Range.Style, Len("Heading")) = "Heading" Then

       Debug.Print DocPara.Range.Text

     End If

    Next


End Sub

顺便说一下,我认为去掉段落范围的最后一个字符是个好主意。否则,如果你把这个字符串发送到消息框或文档中,Word会显示一个额外的控制字符。例如:

Left(DocPara.Range.Text, len(DocPara.Range.Text)-1)

2
更喜欢这个答案而不是被选中的答案——它给了我更好的结果和更多的灵活性。 - Praesagus
1
我尝试过这个,但速度慢得无法忍受...循环遍历我的文档(有很多表格,所以有超过45000段落)需要大约15分钟的处理时间。 - FraggaMuffin

2

这个宏在我的Word 2010上表现得非常好。我稍微扩展了一下功能:现在它会提示用户输入最小级别,并抑制低于该级别的副标题。

Public Sub CreateOutline()
' from https://dev59.com/THVC5IYBdhLWcg3wfxQ8
    Dim docOutline As Word.Document
    Dim docSource As Word.Document
    Dim rng As Word.Range

    Dim astrHeadings As Variant
    Dim strText As String
    Dim intLevel As Integer
    Dim intItem As Integer
    Dim minLevel As Integer

    Set docSource = ActiveDocument
    Set docOutline = Documents.Add

    minLevel = 1  'levels above this value won't be copied.
    minLevel = CInt(InputBox("This macro will generate a new document that contains only the headers from the existing document. What is the lowest level heading you want?", "2"))

    ' Content returns only the
    ' main body of the document, not
    ' the headers and footer.
    Set rng = docOutline.Content
    astrHeadings = _
     docSource.GetCrossReferenceItems(wdRefTypeHeading)

    For intItem = LBound(astrHeadings) To UBound(astrHeadings)
        ' Get the text and the level.
        strText = Trim$(astrHeadings(intItem))
        intLevel = GetLevel(CStr(astrHeadings(intItem)))

        If intLevel <= minLevel Then

            ' Add the text to the document.
            rng.InsertAfter strText & vbNewLine

            ' Set the style of the selected range and
            ' then collapse the range for the next entry.
            rng.Style = "Heading " & intLevel
            rng.Collapse wdCollapseEnd
        End If
    Next intItem
End Sub

Private Function GetLevel(strItem As String) As Integer
    ' from https://dev59.com/THVC5IYBdhLWcg3wfxQ8
    ' Return the heading level of a header from the
    ' array returned by Word.

    ' The number of leading spaces indicates the
    ' outline level (2 spaces per level: H1 has
    ' 0 spaces, H2 has 2 spaces, H3 has 4 spaces.

    Dim strTemp As String
    Dim strOriginal As String
    Dim intDiff As Integer

    ' Get rid of all trailing spaces.
    strOriginal = RTrim$(strItem)

    ' Trim leading spaces, and then compare with
    ' the original.
    strTemp = LTrim$(strOriginal)

    ' Subtract to find the number of
    ' leading spaces in the original string.
    intDiff = Len(strOriginal) - Len(strTemp)
    GetLevel = (intDiff / 2) + 1
End Function

1

根据Wikis对VonC答案的评论,这是对我有效的代码。它使函数更快。

Public Sub CopyHeadingsInNewDoc()
    Dim docOutline As Word.Document
    Dim docSource As Word.Document
    Dim rng As Word.Range

    Dim astrHeadings As Variant
    Dim strText As String
    Dim longLevel As Integer
    Dim longItem As Integer

    Set docSource = ActiveDocument
    Set docOutline = Documents.Add

    ' Content returns only the
    ' main body of the document, not
    ' the headers and footer.
    Set rng = docOutline.Content
    astrHeadings = _
     docSource.GetCrossReferenceItems(wdRefTypeHeading)

    For intItem = LBound(astrHeadings) To UBound(astrHeadings)
        ' Get the text and the level.
        strText = Trim$(astrHeadings(intItem))
        intLevel = GetLevel(CStr(astrHeadings(intItem)))

        ' Add the text to the document.
        rng.InsertAfter strText & vbNewLine

        ' Set the style of the selected range and
        ' then collapse the range for the next entry.
        rng.Style = "Heading " & intLevel
        rng.Collapse wdCollapseEnd
    Next intItem
End Sub

Private Function GetLevel(strItem As String) As Integer
    ' Return the heading level of a header from the
    ' array returned by Word.

    ' The number of leading spaces indicates the
    ' outline level (2 spaces per level: H1 has
    ' 0 spaces, H2 has 2 spaces, H3 has 4 spaces.

    Dim strTemp As String
    Dim strOriginal As String
    Dim longDiff As Integer

    ' Get rid of all trailing spaces.
    strOriginal = RTrim$(strItem)

    ' Trim leading spaces, and then compare with
    ' the original.
    strTemp = LTrim$(strOriginal)

    ' Subtract to find the number of
    ' leading spaces in the original string.
    longDiff = Len(strOriginal) - Len(strTemp)
    GetLevel = (longDiff / 2) + 1
End Function

对我六年前的回答有趣的看法。+1 - VonC
我本可以编辑你的回答,但因为你还没有编辑以下维基评论,我不确定这是否是一个好主意!(我在VBA方面仍然是个新手) - MagTun
@VonC 顺便问一下,这个函数有没有办法只选择标题1和标题2(如果您愿意,可以编辑我的答案以反映更改;-)!) - MagTun

1

提取所有标题(至LEVEL5)的最快方法。

Sub EXTRACT_HDNGS()
Dim WDApp As Word.Application    'WORD APP
Dim WDDoc As Word.Document       'WORD DOC

Set WDApp = Word.Application
Set WDDoc = WDApp.ActiveDocument

For Head_n = 1 To 5
Head = ("Heading " & Head_n)
WDApp.Selection.HomeKey wdStory, wdMove

    Do
       With WDApp.selection
      .MoveStart Unit:=wdLine, Count:=1    
      .Collapse Direction:=wdCollapseEnd
       End with
        With WDApp.Selection.Find
          .ClearFormatting:          .text = "":     
          .MatchWildcards = False:   .Forward = True
          .Style = WDDoc.Styles(Head)
         If .Execute = False Then GoTo Level_exit
            .ClearFormatting
        End With

       Heading_txt = RemoveSpecialChar(WDApp.Selection.Range.text, 1):              Debug.Print Heading_txt
       Heading_lvl = WDApp.Selection.Range.ListFormat.ListLevelNumber:              Debug.Print Heading_lvl
       Heading_lne = WDDoc.Range(0, WDApp.Selection.Range.End).Paragraphs.Count:    Debug.Print Heading_lne
       Heading_pge = WDApp.Selection.Information(wdActiveEndPageNumber):            Debug.Print Heading_pge

       If Wdapp.Selection.Style = "Heading 1" Then GoTo Level_exit
       Wdapp.Selection.Collapse Direction:=wdCollapseStart
   Loop
Level_exit:
Next Head_n

End Sub

1

为什么要反复造轮子?!?

“所有标题列表”只是文档中标准的Word索引!

这是我在录制宏时添加索引到文档时得到的结果:

Sub Macro1()
    ActiveDocument.TablesOfContents.Add Range:=Selection.Range, _
        RightAlignPageNumbers:=True, _
        UseHeadingStyles:=True, _
        UpperHeadingLevel:=1, _
        LowerHeadingLevel:=5, _
        IncludePageNumbers:=True, _
        AddedStyles:="", _
        UseHyperlinks:=True, _
        HidePageNumbersInWeb:=True, _
        UseOutlineLevels:=True
End Sub

0

你也可以在文档中创建目录并复制它。这将分离出段落引用和标题,如果您需要在另一个上下文中呈现它,则非常方便。 如果您不想在文档中使用目录,请在复制粘贴后删除它。JK。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接