有没有一种更快的方式在VBA中比较动态数组之间的数据?

6
我已经成功地编写了以下代码,但它必须在两个每个包含130k+行的数组上运行。目前完整数据集的运行时间约为24分钟,并且在某个时刻添加了一个计数器后,它循环了98亿次。
我阅读了关于使用Match、Vlookup等的文章,它们都建议使用迭代循环(如我所用)是最快的方法,但是我无法理解如何使其他方法与动态数组一起工作,从而适当地测试。
是否有人能告诉我是否有更快的完成此操作的方法,并在必要时演示如何实现?
Sub TESTVLOOKUPARRAY()
    Dim PSORG1() As Variant
    Dim PSORG1Tot As Variant
    Dim PSORG1RT As Variant
    Dim PSORG2() As Variant
    Dim PSORG2Tot As Variant
    Dim PSORG2RT As Variant

    Sheets("Sheet1").Select
    PSORG2RT = Application.CountA(Range("A:A"))
    PSORG2Tot = "A1:B" & PSORG2RT
    PSORG2 = Range(PSORG2Tot) ' PSORG2 is now an allocated array

    Sheets("Sheet2").Select
    PSORG1RT = Application.CountA(Range("A:A"))
    PSORG1Tot = "A1:B" & PSORG1RT
    PSORG1 = Range(PSORG1Tot) ' PSORG1 is now an allocated array

    a = 2 ' to increment ORG values in PSORG1

    Do
        Finish = "No"
        b = 1 ' to increment ORG values in PSORG2
        Do
            If PSORG1(a, 1) = PSORG2(b, 1) Then
                PSORG1(a, 2) = PSORG2(b, 2)
                Finish = "True"
            ElseIf b = PSORG2RT Then
                PSORG1(a, 2) = "NULL"
                Finish = "True"
            End If
            b = b + 1
        Loop Until Finish = "True"
        a = a + 1
    Loop Until a = PSORG1RT + 1

    Sheets("Sheet2").Select
    Set Destination = Range("A1")
    Destination.Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)).Value = PSORG1

End Sub

我会使用ArrayList,因为它有 .Exists() 方法可以进行快速比较。 - SierraOscar
1
请务必告诉我们谁是获胜者(在您的数据中)。 :P - user4039065
由于您的代码按预期运行,只是速度比您想要的慢,因此将其放在Code Review上会更合适。 - FreeMan
3个回答

4

我同意使用Scripting.Dictionary方法。

此过程利用Scripting.Dictionsry。您需要进入VBE的工具►引用并添加对Microsoft Scripting Runtime的引用。

Sub TESTVLOOKUPARRAY()
    Dim PSORG1 As Variant, PSORG2 As Variant
    Dim a As Long, b As Long
    Dim dPSORG2 As New Scripting.dictionary

    dPSORG2.CompareMode = TextCompare

    Debug.Print Timer

    With Sheets("Sheet1")
        a = .Cells(Rows.Count, 1).End(xlUp).Row
        PSORG2 = .Cells(1, 1).Resize(a, 2).Value2 ' PSORG2 is now an allocated array
        For b = LBound(PSORG2, 1) To UBound(PSORG2, 1)
            dPSORG2.Item(PSORG2(b, 1)) = PSORG2(b, 2)
        Next b
    End With

    With Sheets("Sheet2")
        a = .Cells(Rows.Count, 1).End(xlUp).Row
        PSORG1 = .Cells(1, 1).Resize(a, 2).Value2 ' PSORG1 is now an allocated array
    End With

    Debug.Print dPSORG2.Count
    Debug.Print LBound(PSORG2, 1) & ":" & UBound(PSORG2, 1)
    Debug.Print LBound(PSORG2, 2) & ":" & UBound(PSORG2, 2)
    Debug.Print LBound(PSORG1, 1) & ":" & UBound(PSORG1, 1)
    Debug.Print LBound(PSORG1, 2) & ":" & UBound(PSORG1, 2)

    For b = LBound(PSORG1, 1) To UBound(PSORG1, 1)
        If dPSORG2.Exists(PSORG1(b, 1)) Then
            PSORG1(b, 2) = dPSORG2.Item(PSORG1(b, 1))
        Else
            PSORG1(b, 2) = "NULL"
        End If
    Next b


    With Sheets("Sheet2")
        .Cells(1, 1).Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)) = PSORG1
    End With

    Debug.Print Timer

End Sub

就我样本数据而言,Sheet1有110K行,Sheet2有95K行,使用原始代码运行时间为20分钟40秒。使用上述方法在相同数据上只需1.72秒。


所以,我选择了这个选项,它似乎是最有意义的,在短时间测试中速度快得多。 然后,我尝试将其修改为实时数据集,其中我正在尝试将三列数据拉入原始数据集。 对于一列数据,它大约需要7-8分钟才能拉取。但是,要拉取多列数据,我找不到不创建另一个循环或定义多个字典的方法。这使开销增加了三倍,并导致运行时间比原始时间更长。希望我能更好地理解它 :( - VBACrazy
这里的逻辑路径似乎很复杂,真的需要内联注释。 - Mark Kramer

2

我认为使用字典会使代码更快。

下面的代码执行相同的任务,但它使用了字典对象。 在我的电脑上,它比你自己的代码快了大约100倍(在两个包含5K行数据的工作表上进行测试,对于更大的数据集,收益应该更好)。

Public Function TestVLookupArray2()
    Dim dict As Object
    Dim result As Variant
    Dim i As Long
    Dim destination As Excel.Range


    'Load values from Sheet1 into Dictionary.
    Set dict = getDataFromSheetAsDictionary(Sheets("Sheet1"))

    result = getDataFromSheet(Sheets("Sheet2"))

    For i = LBound(result, 1) To UBound(result, 1)

        With dict
            If .exists(result(i, 1)) Then
                result(i, 2) = .Item(result(i, 1))
            Else
                result(i, 2) = "NULL"
            End If
        End With

    Next i

    With Sheets("Sheet2")
        Set destination = .Range(.Cells(1, 1), .Cells(UBound(result, 1), UBound(result, 2)))
        destination = result
    End With

End Function


Private Function getDataFromSheet(wks As Excel.Worksheet) As Variant
    Dim lastRow As Long

    With wks
        lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
        getDataFromSheet = .Range(.Cells(1, 1), .Cells(lastRow, 2))
    End With

End Function


Private Function getDataFromSheetAsDictionary(wks As Excel.Worksheet) As Object
    Dim i As Long
    Dim key As String
    Dim value As Variant
    Dim arr As Variant

    Set getDataFromSheetAsDictionary = VBA.CreateObject("Scripting.Dictionary")

    arr = getDataFromSheet(wks)

    With getDataFromSheetAsDictionary
        For i = LBound(arr, 1) To UBound(arr, 1)

            If Not .exists(arr(i, 1)) Then
                Call .Add(arr(i, 1), arr(i, 2))
            End If

        Next i
    End With

End Function

请注意,此代码由3个不同的函数组成,您需要包含它们所有。
以下是介绍字典的文章:http://www.techbookreport.com/tutorials/vba_dictionary.html 如果您对此代码有任何疑问,请在评论中让我知道。

你是如何从函数中将值返回到Sheet2的? - user4039065
@Jeeped destination = result -> 目标 = 结果 - mielk
我想我会把它保留为子函数,因为它没有返回值。 - user4039065

1
我使用了一个for next循环,而不是你使用的do循环。
Sub speed_up2()
    Dim PSORG1() As Variant, PSORG2() As Variant
    Dim PSORG1Tot As Range, PSORG2Tot As Range, Destination As Range
    Dim PSORG1RT As Long, PSORG2RT As Long
    Dim wb As Workbook, ws_1 As Worksheet, ws_2 As Worksheet
    Dim i As Byte, j As Byte

    Set wb = ThisWorkbook
    Set ws_1 = wb.Sheets("Sheet1")
    Set ws_2 = wb.Sheets("Sheet2")

    with ws_1
        PSORG2RT = .Cells(Rows.Count, 1).End(xlUp).Row ' Get last row
        Set PSORG2Tot = .Range("A1:B" & PSORG2RT)
        PSORG2 = PSORG2Tot ' PSORG2 is now an allocated array
    End With

    With ws_2
        PSORG1RT = .Cells(Rows.Count, 1).End(xlUp).Row
        Set PSORG1Tot = .Range("A1:B" & PSORG1RT)
        PSORG1 = PSORG1Tot ' PSORG1 is now an allocated array
    End With

    For i = 1 To UBound(PSORG1)
        For j = 1 To UBound(PSORG2)
            PSORG1(i, 2) = "NULL"
            If PSORG1(i, 1) = PSORG2(j, 1) Then
                PSORG1(i, 2) = PSORG2(j, 2)
                Exit For
            End If
        Next j
    Next i

    Set Destination = ws_2.Range("A1")
    Destination.Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)).Value = PSORG1
End Sub

请点击这里查看在for next循环和do循环之间执行的速度测试。
如文章所述,for next循环为您执行下一次迭代的计算,而使用do循环时,您需要自己每次增加迭代。这可以节省大量时间。
我还改进了获取最后一行的方法,用于创建范围。这是我的个人偏好;与Application.COUNTA相比,这也可能更安全。

如果在进入 For j = 1 To UBound(PSORG2) 循环之前分配 PSORG2(i, 2) = "NULL",则如果找到该值,它将被覆盖(并退出),否则它将保持不变。这可能比不断检查是否已到达循环结尾略微更快。 - user4039065
没错。@Jeeped 给了一个好建议。我已经修改了代码。谢谢! - luke_t

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接