每个字典项的第一个成员需要是一个数组。

4
我有一段代码可以循环遍历文档,然后将每个单词存储在字典中。字典中的每个键都是“单词”本身,与该键对应的每个字典项都是一个数组,其中包含该单词的(起始和结束)范围。如果单词出现多次,我们只需将新范围添加到字典项数组中,这意味着该项数组已成为一个包含单词实例“n”的范围的子数组的数组,如图所示。

dictionary structure

问题是:每个字典项中的第一个子数组被分成两个数组,一个存储起始范围,一个存储结束范围,如监视窗口所示。我只需要一个数组来存储这些值,后面的子数组都可以。

watches window

当我涉及到高亮代码时,我希望能够根据用户输入高亮显示出现“x”次数的单词。然而,在上述问题存在的情况下,每个单词似乎都会显示出现“x+1”次,例如,如果一个单词出现了“3”次,那么该单词的项数将显示为“4”次,这是错误的,我可以通过减去(1)来修复它,但我不想这样做。是的,我可以编写一段代码,使用“find”方法来突出显示具有“x”频率的单词。

enter image description here

这里是完整的代码

Sub MapWordsAndHighlight()
    Dim WordsDict As Object

    'a dict. to hold words and their range values
    Set WordsDict = CreateObject("Scripting.Dictionary")

    'an object representing each word in the cgosen document
    Dim WordObject As Variant

    'a temp. arr. to hold range values while adding new ones
    Dim TmpRangeArrOfDupWords() As Variant

    'the new upper bound of the tmp arr to recive the new values
    Dim TmpArrayNewUpperBound As Long

    'string that represents each words in the chosen document
    Dim SingleWord As String

    Dim i As Long
    'loop through each word in current document
    For Each WordObject In ActiveDocument.Range.Words

        'remove the surrounding spaces and store the word string
        SingleWord = Trim(WordObject.Text)

        'skip single characters
        If Len(SingleWord) > 1 Then

            'check if the word is not stored previously in the dict
            If Not WordsDict.Exists(SingleWord) Then
                WordsDict.Add Key:=SingleWord, Item:=GetWordRangeArray(WordObject)
            Else

                'dump old range vlaues into tmp array
                TmpRangeArrOfDupWords = WordsDict(SingleWord)

                'make a new place in tmp arr for new ranges
                TmpArrayNewUpperBound = UBound(TmpRangeArrOfDupWords) + 1

                'expand the tmp array
                ReDim Preserve TmpRangeArrOfDupWords(1 To TmpArrayNewUpperBound)

                'store new ranges at the last placein tmp arr
                TmpRangeArrOfDupWords(TmpArrayNewUpperBound) = GetWordRangeArray(WordObject)
                WordsDict(SingleWord) = TmpRangeArrOfDupWords
                Erase TmpRangeArrOfDupWords
            End If
        End If
    Next
    '============================================================
    '** this part highlights words that are repeated "n" times
    'loop through dict. items array
    For Each var In WordsDict.Keys
        'replace the "2" with "n" if you want to highlight the words that are repeated "n" times
        If UBound(WordsDict.Item(var)) = 2 Then
            ThisDocument.Range(WordsDict.Item(var)(1), WordsDict.Item(var)(2)).HighlightColorIndex = wdBrightGreen
        End If
    Next
    '============================================================
End Sub

Function GetWordRangeArray(WordObject) As Variant()
'static variant array of two item
    Dim RangeValue(1 To 2)

    'store the starting range
    RangeValue(1) = WordObject.Start

    'store the end range, (-1) neglect space at the end of word
    RangeValue(2) = WordObject.End - 1

    'returned value
    GetWordRangeArray = RangeValue

    'clear the array
    Erase RangeValue
End Function

1
鉴于单词长度容易确定,为什么不只存储每个出现的起始位置呢? - John Coleman
@John Coleman,谢谢你的想法,它确实有效。但我需要解决这个问题。以下是新的代码更改,根据你的想法行内 code Dim RangeValue(1 To 1) RangeValue(1) = WordObject.Start and code ThisDocument.Range(WordsDict.Item(var)(1), WordsDict.Item(var)(1) + Len(var))。 - Ali_R4v3n
数组必须从1开始吗? - omegastripes
@omegastripes... 不是必须的,我尝试了从零开始,结果相同。 - Ali_R4v3n
还有一个问题是WordObject.End - 1:单词后面的句点或逗号会截断最后一个字母。 - omegastripes
2个回答

3

我稍微修改了你的代码,现在它可以正常工作。

修改的内容如下:

  1. 检查词典中是否存在某个单词的范围数组的方法。如果不存在,则从空数组开始。这种方法修复了词典中每个项目的第一个成员的问题。
  2. Function GetWordRangeArray() 被废除,使用原生的 Array() 函数创建范围数组,因此变成了以零为基础的数组。
  3. 计算单词结尾的方法是 WordObject.Start + Len(SingleWord)),因为当单词后面跟着句点或逗号时,末尾没有尾随空格,所以原始代码切掉了最后一个字母而不是尾随空格。
  4. 添加了嵌套循环来突出显示某个长度的每个单词的每个出现。
Sub MapWordsAndHighlight()

    'a dict. to hold words and their range values
    Dim WordsDict As Object

    'an object representing each word in the cgosen document
    Dim WordObject As Variant

    'a temp. arr. to hold range values while adding new ones
    Dim TmpRangeArrOfDupWords As Variant

    'the new upper bound of the tmp arr to recive the new values
    Dim TmpArrayNewUpperBound As Long

    'string that represents each words in the chosen document
    Dim SingleWord As String

    Dim i As Long

    Dim CurrentWord As Variant

    Dim CurrentArr As Variant

    Set WordsDict = CreateObject("Scripting.Dictionary")

    'loop through each word in current document
    For Each WordObject In ActiveDocument.Range.Words

        'remove the surrounding spaces and store the word string
        SingleWord = Trim(WordObject.Text)

        'skip single characters
        If Len(SingleWord) > 1 Then

            'dump old range values into tmp array, empty element will be created if not exists
            TmpRangeArrOfDupWords = WordsDict(SingleWord)

            'check if the old range didn't exist
            If Not IsArray(TmpRangeArrOfDupWords) Then TmpRangeArrOfDupWords = Array()

            'make a new place in tmp arr for new ranges
            TmpArrayNewUpperBound = UBound(TmpRangeArrOfDupWords) + 1

            'expand the tmp array
            ReDim Preserve TmpRangeArrOfDupWords(TmpArrayNewUpperBound)

            'store new ranges at the last place in tmp arr
            TmpRangeArrOfDupWords(TmpArrayNewUpperBound) = Array(WordObject.Start, WordObject.Start + Len(SingleWord))
            WordsDict(SingleWord) = TmpRangeArrOfDupWords
            Erase TmpRangeArrOfDupWords
        End If
    Next

    '============================================================
    '** this part highlights words that are repeated "n" times
    'loop through dict. items array
    For Each CurrentWord In WordsDict.Keys
        'replace the "2" with "n" if you want to highlight the words that are repeated "n" times
        If UBound(WordsDict(CurrentWord)) + 1 = 2 Then
            For Each CurrentArr In WordsDict(CurrentWord)
                ThisDocument.Range(CurrentArr(0), CurrentArr(1)).HighlightColorIndex = wdBrightGreen
            Next
        End If
    Next
    '============================================================
End Sub

感谢您的努力。关于WordObject.End-1,我的意思是为了去掉单词末尾的空格,因为 MS Word 将该单词视为带有末尾空格的字符串。 - Ali_R4v3n
@Ali_R4v3n,当单词后面跟着句点或逗号时,单词末尾没有尾随空格,因此您的初始代码会截断最后一个字母而不是尾随空格,这是一个问题。这就是为什么我使用另一种方法来获取单词结尾的原因。 - omegastripes
@Ali_R4v3n,我已经添加了说明所做的更改。 - omegastripes
非常感谢,我在另一个项目中遇到了点号问题,并通过测试每个单词的最后一个字母是否为字符来处理它。但是在这段代码中我没有这样做,再次感谢您提醒我。 - Ali_R4v3n

2
如果您希望您的字典值每个都是由2个成员数组组成,那么您需要从以下内容开始:
Array(Array(a,b))

您可以将其扩展到

Array(Array(a,b), Array(c,d))

而不是

Array(a,b) >> Array(a,b, Array(c,d))

这就是您当前代码的功能。

修复后:

Sub MapWordsAndHighlight()

    Dim WordObject As Variant
    Dim TmpRangeArrOfDupWords() As Variant
    Dim TmpArrayNewUpperBound As Long
    Dim SingleWord As String
    Dim i As Long, var, arr
    Dim WordsDict As Object

    Set WordsDict = CreateObject("Scripting.Dictionary")

    For Each WordObject In ActiveDocument.Range.Words

        SingleWord = Trim(WordObject.Text)


        If Len(SingleWord) > 1 Then
            If Not WordsDict.Exists(SingleWord) Then
                WordsDict.Add Key:=SingleWord, Item:=GetWordRangeArray(WordObject)
            Else
                WordsDict(SingleWord) = GetWordRangeArray(WordObject, _
                                               WordsDict(SingleWord))
            End If
        End If
    Next

    For Each var In WordsDict.Keys
        If UBound(WordsDict.Item(var)) = 2 Then
            arr = WordsDict.Item(var)
            ThisDocument.Range(arr(1)(1), arr(1)(2)).HighlightColorIndex = wdBrightGreen
        End If
    Next
    '============================================================
End Sub

Function GetWordRangeArray(WordObject, Optional arr) As Variant()

    Dim RangeValue(1 To 2), ub

    RangeValue(1) = WordObject.Start
    RangeValue(2) = WordObject.End - 1

    If IsMissing(arr) Then
        Dim rv(1 To 1)
        rv(1) = RangeValue
        GetWordRangeArray = rv
    Else
        ub = UBound(arr) + 1
        ReDim Preserve arr(1 To ub)
        arr(ub) = RangeValue
        GetWordRangeArray = arr
    End If

End Function

谢谢Tim,这实际上是你的代码稍微修改了一下,因为我想将字符串存储在字典而不是对象中,我会尝试这段新代码。 - Ali_R4v3n
我确实认识它... ;-) - Tim Williams
您IP地址为143.198.54.68,由于运营成本限制,当前对于免费用户的使用频率限制为每个IP每72小时10次对话,如需解除限制,请点击左下角设置图标按钮(手机用户先点击左上角菜单按钮)。 - Ali_R4v3n

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接