在vba中是否有内置功能可以从一维数组中获取唯一值?如果只是想要摆脱重复项呢?
如果没有,那么我该如何从数组中获取唯一值?
这篇文章包含两个示例,我喜欢第二个:
Sub unique()
Dim arr As New Collection, a
Dim aFirstArray() As Variant
Dim i As Long
aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _
"Lemon", "Lime", "Lime", "Apple")
On Error Resume Next
For Each a In aFirstArray
arr.Add a, a
Next
On Error Goto 0 ' added to original example by PEH
For i = 1 To arr.Count
Cells(i, 1) = arr(i)
Next
End Sub
没有内置的功能可以从数组中去除重复项。Raj的答案似乎很优雅,但我更喜欢使用字典。
Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
'Set d = New Scripting.Dictionary
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
d(myArray(i)) = 1
Next i
Dim v As Variant
For Each v In d.Keys()
'd.Keys() is a Variant array of the unique values in myArray.
'v will iterate through each of them.
Next v
编辑:我更改了循环以使用LBound
和UBound
,正如Tomalak所建议的答案。
编辑:d.Keys()
是一种变体数组,而不是集合。
New
语法时,才会这样。 只要安装并从引用窗口中可用Microsoft Scripting Runtime,CreateObject
将无需引用也能工作。 - eksortsod.Keys()
是一种变体数组。(我会更正我的答案。) 如果你不相信我,可以试试这个:set d = CreateObject("Scripting.Dictionary")
d(45) = 1
d(33.33) = 1
d("45") = 1
for each i in d.keys(): ?i, typename(i): next
你将得到一个 Integer
、一个 Double
和一个 String
。 - eksortsoDim d As Dictionary
)。在大量数据的数组中,声明为对象并进行后期绑定可能会导致性能问题。 - Mikhail Tumashenkod(myArray(i)) = 1
所做的操作,字典 d
检查键是否存在。如果键不存在,则创建该键,如果键已经存在,则重用该键。因此,字典的键保证是唯一的。我们分配给每个键的值 1
只是一个虚拟值;我们只是使用字典的键来获取原始集合元素的不同集合。 - eksortso我创建了更详细的基准测试。首先,正如@ChaimG指出的那样,早期绑定会产生很大的差异(我最初直接使用了@eksortso上面的代码,该代码使用了后期绑定)。其次,我的原始基准测试只包括创建唯一对象的时间,但是它没有测试使用对象的效率。我这样做的目的是,如果我创建的对象笨重且使我在以后移动时变慢,那么我真的无法快速创建对象。
旧注释:事实证明,循环遍历集合对象非常低效
事实证明,如果您知道如何做到这一点(我不知道),则循环遍历集合可能非常有效。正如@ChaimG(再次)在评论中指出的那样,使用For Each
结构比简单地使用For
循环要好得多。为了给您一个想法,在更改循环结构之前,对于Test Case Size = 10^6
的Collection2
,时间超过了1400秒(即约23分钟)。现在只有0.195s(快了7000倍)。
对于Collection
方法,有两个时间。第一个(我的原始基准测试Collection1
)显示创建唯一对象的时间。第二部分(Collection2
)显示循环遍历对象(非常自然)以创建可返回的数组,就像其他函数所做的那样。
在下面的图表中,黄色背景表示它是该测试用例最快的,红色表示最慢的(“未测试”算法被排除在外)。Collection
方法的总时间是Collection1
和Collection2
的总和。青绿色表示不管原来的顺序如何,它都是最快的。
以下是我创建的原始算法(我稍微修改了它,例如我不再实例化自己的数据类型)。它以非常可观的时间返回具有原始顺序的数组的唯一值,并且可以修改为使用任何数据类型。除了IndexMethod
之外,它是非常大的数组的最快算法。
以下是此算法背后的主要思想:
以下是一个例子:
Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
1. (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
(1 , 2, 3, 4, 5, 6, 7, 8, 9, 10) <<-- Indexing
2. (19, 19, 19, 33, 33, 86, 100, 100, 703, 703) <<-- sort by values
(4, 7, 10, 3, 5, 1, 2, 8, 6, 9)
3. (19, 33, 86, 100, 703) <<-- remove duplicates
(4, 3, 1, 2, 6)
4. (86, 100, 33, 19, 703)
( 1, 2, 3, 4, 6) <<-- sort by index
这里是代码:
Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
Dim MyUniqueArr() As Long, i As Long, intInd As Integer
Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long
LowB = LBound(myArray): HighB = UBound(myArray)
ReDim MyUniqueArr(1 To 2, LowB To HighB)
intInd = 1 - LowB 'Guarantees the indices span 1 to Lim
For i = LowB To HighB
MyUniqueArr(1, i) = myArray(i)
MyUniqueArr(2, i) = i + intInd
Next i
QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
Call UniqueArray2D(MyUniqueArr)
If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
SortingUniqueTest = MyUniqueArr()
End Function
Public Sub UniqueArray2D(ByRef myArray() As Long)
Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
Dim lngTemp As Long, HighB As Long, LowB As Long
LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)
Do While i < HighB
j = i + 1
If myArray(1, i) = myArray(1, j) Then
Do While myArray(1, i) = myArray(1, j)
ReDim Preserve DuplicateArr(1 To Count)
DuplicateArr(Count) = j
Count = Count + 1
j = j + 1
If j > HighB Then Exit Do
Loop
QSLong2D myArray, 2, i, j - 1, 2
End If
i = j
Loop
Count1 = HighB
If Count > 1 Then
For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
myArray(1, DuplicateArr(i)) = myArray(1, Count1)
myArray(2, DuplicateArr(i)) = myArray(2, Count1)
Count1 = Count1 - 1
ReDim Preserve myArray(1 To 2, LowB To Count1)
Next i
End If
End Sub
这是我使用的排序算法(更多关于该算法的信息在此处)。
Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
Dim lLow2 As Long, lHigh2 As Long
Dim sKey As Long, sSwap As Long, i As Byte
On Error GoTo ErrorExit
If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
lLow2 = lLow1
lHigh2 = lHigh1
sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)
Do While lLow2 < lHigh2
Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop
If lLow2 < lHigh2 Then
For i = 1 To bytNum
sSwap = saArray(i, lLow2)
saArray(i, lLow2) = saArray(i, lHigh2)
saArray(i, lHigh2) = sSwap
Next i
End If
If lLow2 <= lHigh2 Then
lLow2 = lLow2 + 1
lHigh2 = lHigh2 - 1
End If
Loop
If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum
ErrorExit:
End Sub
以下是一种特殊的算法,如果您的数据包含整数,则运行速度非常快。它利用了索引和布尔数据类型。
Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
'' Modified to take both positive and negative integers
Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
Dim LowB As Long, myIndex As Long, count As Long, myRange As Long
HighB = UBound(myArray)
LowB = LBound(myArray)
For i = LowB To HighB
If myArray(i) > myMax Then myMax = myArray(i)
If myArray(i) < myMin Then myMin = myArray(i)
Next i
OffSet = Abs(myMin) '' Number that will be added to every element
'' to guarantee every index is non-negative
If myMax > 0 Then
myRange = myMax + OffSet '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
Else
myRange = OffSet
End If
If bOrigIndex Then
ReDim arrSort(1 To 2, 1 To HighB)
ReDim arrVals(1 To 2, 0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(1, myIndex) = myArray(i)
If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(1, count) = arrVals(1, i)
arrSort(2, count) = arrVals(2, i)
End If
Next i
QSLong2D arrSort, 2, 1, count, 2
ReDim Preserve arrSort(1 To 2, 1 To count)
Else
ReDim arrSort(1 To HighB)
ReDim arrVals(0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(myIndex) = myArray(i)
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(count) = arrVals(i)
End If
Next i
ReDim Preserve arrSort(1 To count)
End If
ReDim arrVals(0)
ReDim arrBool(0)
IndexSort = arrSort
End Function
这里是由@DocBrown创建的集合(Collection)和由@eksortso创建的字典(Dictionary)函数。
Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
On Error Resume Next
ReDim arrOut(1 To UBound(arrIn))
ReDim aFirstArray(1 To UBound(arrIn))
StrtTime = Timer
For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
For Each a In aFirstArray ''' This part is actually creating the unique set
arr.Add a, a
Next
EndTime1 = Timer - StrtTime
StrtTime = Timer ''' This part is writing back to an array for return
For Each a In arr: count = count + 1: arrOut(count) = a: Next a
EndTime2 = Timer - StrtTime
CollectionTest = Array(arrOut, EndTime1, EndTime2)
End Function
Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
Dim StrtTime As Double, Endtime As Double
Dim d As Scripting.Dictionary, i As Long '' Early Binding
Set d = New Scripting.Dictionary
For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
DictionaryTest = d.Keys()
End Function
这里是@IsraelHoletz提供的直接方法。
Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
Dim i As Long, j As Long, k As Long
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
Dim aReturn() As Variant
Dim StrtTime As Long, Endtime As Long, i As Long
aReturn = ArrayUnique(aArray)
DirectTest = aReturn
End Function
以下是比较所有函数的基准测试函数。需要注意的是,由于内存问题,最后两个测试用例的处理方式略有不同。另外请注意,我没有针对Test Case Size = 10,000,000
测试Collection
方法。出于某种原因,它返回了不正确的结果并表现不寻常(我猜测集合对象在可以放置多少个项方面存在限制。我尝试搜索,但没有找到任何相关文献)。
Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant
Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2
ReDim myArray(1 To Lim): Rnd (-2) '' If you want to test negative numbers,
'' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
arrTest = myArray
If bytCase = 1 Then
If bTestDictionary Then
StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
Else
EndTime1 = "Not Tested"
End If
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
If bTestDirect Then
arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
Else
EndTime3 = "Not Tested"
End If
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
bEquality = True
For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = sortingTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = indexTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
If bTestDirect Then
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = directT(i + 1) Then
bEquality = False
Exit For
End If
Next i
End If
UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
ElseIf bytCase = 2 Then
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
UltimateTest = Array(collectTest(1), collectTest(2))
ElseIf bytCase = 3 Then
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
UltimateTest = Array(EndTime2, SizeUnique)
ElseIf bytCase = 4 Then
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
UltimateTest = EndTime4
ElseIf bytCase = 5 Then
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
UltimateTest = EndTime5
ElseIf bytCase = 6 Then
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
UltimateTest = EndTime6
End If
End Function
最后,这是生成上述表格的子程序。
Sub GetBenchmarks()
Dim myVar, i As Long, TestCases As Variant, j As Long, temp
TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)
For j = 0 To 11
If j < 6 Then
myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
ElseIf j < 10 Then
myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
ElseIf j < 11 Then
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
myVar(7) = temp(0): myVar(8) = temp(1)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
Else
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
End If
Cells(4 + j, 6) = TestCases(j)
For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
Cells(4 + j, 17) = myVar(9)
Next j
End Sub
从结果表格中可以看出,对于小于约500,000的情况,Dictionary
方法效果非常好,但是在此之后,IndexMethod
真正开始占优势。当顺序无关紧要且数据由正整数组成时,IndexMethod
算法无与伦比(它仅需不到1秒即可从包含1000万元素的数组中返回唯一值!!!太惊人了!)。下面我将详细介绍各种情况下更推荐哪种算法。
情况1IndexMethod
情况2Dictionary Method
情况3Collection Method
如果必须选择一种算法,我认为 Collection
方法仍然是最佳选择,因为它只需要几行代码,非常通用,而且速度足够快。IndexMethod
只能用于整数,SortingAlgo
可以扩展到任何标准数据类型,Collection
和Dictionary
方法都可以用于任何标准数据类型。注意:我还没有对上述任何方法进行除整数以外的基准测试。 - Joseph Wood从365版本开始,它支持使用UNIQUE函数,但如果您的输入不是一个范围,则需要先转置。为了增加乐趣,可以使用SORT函数。
Option Explicit
Sub testIt()
Dim arr() As Variant
arr = [{1,2,3,4,1,2}]
uniquify arr
arr = Array( _
"Banana", "Apple", "Orange", "Tomato", "Apple", _
"Lemon", "Lime", "Lime", "Apple" _
)
uniquify arr
arr = [{"a", "b", "a", "c", "a", "b", "a"}]
uniquify arr
End Sub
Sub uniquify(arr As Variant)
Dim buffer() As Variant, b As Variant
buffer = WorksheetFunction.Sort( _
WorksheetFunction.Unique( _
WorksheetFunction.Transpose(arr) _
) _
)
For Each b In buffer
Debug.Print b
Next
End Sub
我不知道VBA中是否有内置功能。最好的方法是使用一个集合,将值用作键,并仅在不存在值时才添加到其中。
没有内置功能,需要自己实现:
Scripting.Dictionary
对象For
循环来遍历您的数组(一定要使用 LBound()
和 UBound()
而不是从 0 到 x 循环!)Exists()
。将每个数组值(尚未存在的)作为字典键添加到字典中 (在CStr()
时必须转换成字符串作为键Keys()
(或 Items()
)将字典的所有值作为新的、唯一的数组返回。Scripting.Dictionary
在 Mac 版本中不可用。 - ekkis使用由我主要维护的stdVBA库,您可以使用以下功能:
uniqueValues = stdEnumerator.CreateFromArray(myArray).Unique().AsArray()
注意:
您还可以为某些实现了 IEnumVARIANT
接口的集合/任何对象获取其 Unique
值:
uniqueValues = stdEnumerator.CreateFromIEnumVARIANT(myCollection).Unique().AsCollection()
您还可以通过某个对象的属性来获取Unique
:
uniqueValues = stdEnumerator.CreateFromIEnumVARIANT(ThisWorkbook.Sheets).Unique(stdLambda("$1.range(""A1"").value")).AsCollection()
集合和字典解决方案对于简短的方法都很好,但如果您想要速度,请尝试使用更直接的方法:
Function ArrayUnique(ByVal aArrayIn As Variant) As Variant
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' ArrayUnique
' This function removes duplicated values from a single dimension array
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim aArrayOut() As Variant
Dim bFlag As Boolean
Dim vIn As Variant
Dim vOut As Variant
Dim i%, j%, k%
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
调用它:
Sub Test()
Dim aReturn As Variant
Dim aArray As Variant
aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test")
aReturn = ArrayUnique(aArray)
End Sub
为了速度比较,这个解决方案将比字典解决方案快100倍到130倍,比集合解决方案快约8000倍到13000倍。
不,VBA没有这个功能。您可以使用将每个项目添加到集合中并使用该项作为键的技术。由于集合不允许重复键,因此结果是不同的值,如果需要,可以将其复制到数组中。
您可能还需要更强大的功能。请参见http://www.cpearson.com/excel/distinctvalues.aspx上的Distinct Values Function。
Distinct Values Function 这是一个VBA函数,它可以返回一个输入值范围或数组中不同值的数组。没有VBA内置的功能可以从数组中删除重复项,但您可以使用以下函数:
Function RemoveDuplicates(MyArray As Variant) As Variant
With CreateObject("scripting.dictionary")
For Each item In MyArray
c00 = .Item(item)
Next
sn = .keys ' the array .keys contains all unique keys
MsgBox Join(.keys, vbLf) ' you can join the array into a string
RemoveDuplicates = .keys ' return an array without duplicates
End With
End Function
arr.Add a
替换了arr.Add a, a
,这样是行不通的。请注意,Collection.add(item,key)
仅在之前未使用过该键时才会添加新项。 - Doc Brown