在HashSet上迭代后从中移除元素失败

Question

在HashSet上迭代后从中移除元素失败

3

我正在用Java编写一个聚类算法，但在删除操作时遇到了问题。当簇的数量达到初始数量的一半时，它似乎总是失败。

在下面的示例代码中，`clusters` 是一个 `Collection>` 类型。

      while(clusters.size() > K){
           // determine smallest distance between clusters
           Collection<Integer> minclust1 = null;
           Collection<Integer> minclust2 = null;
           double mindist = Double.POSITIVE_INFINITY;

           for(Collection<Integer> cluster1 : clusters){
                for(Collection<Integer> cluster2 : clusters){
                     if( cluster1 != cluster2 && getDistance(cluster1, cluster2) < mindist){
                          minclust1 = cluster1;
                          minclust2 = cluster2;
                          mindist = getDistance(cluster1, cluster2);
                     }
                }
           }

           // merge the two clusters
           minclust1.addAll(minclust2);
           clusters.remove(minclust2);
      }

通过循环几次后，clusters.remove(minclust2)最终会返回false，但我不明白为什么。

我通过先创建了10个集群，每个集群包含1到10之间的一个整数来测试此代码。距离是0到1之间的随机数。这是添加了一些println语句后的输出结果。在打印出集群数量后，我打印出实际的集群，合并操作和clusters.remove(minclust2)的结果。

Clustering: 10 clusters
[[3], [1], [10], [5], [9], [7], [2], [4], [6], [8]]
[5] <- [6]
true
Clustering: 9 clusters
[[3], [1], [10], [5, 6], [9], [7], [2], [4], [8]]
[7] <- [8]
true
Clustering: 8 clusters
[[3], [1], [10], [5, 6], [9], [7, 8], [2], [4]]
[10] <- [9]
true
Clustering: 7 clusters
[[3], [1], [10, 9], [5, 6], [7, 8], [2], [4]]
[5, 6] <- [4]
true
Clustering: 6 clusters
[[3], [1], [10, 9], [5, 6, 4], [7, 8], [2]]
[3] <- [2]
true
Clustering: 5 clusters
[[3, 2], [1], [10, 9], [5, 6, 4], [7, 8]]
[10, 9] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4, 5, 6, 4] <- [5, 6, 4]
false

这个集合 [10, 9, 5, 6, 4, 5, 6, 4, ...] 只会无限增长。

编辑说明：为了澄清，我在 clusters（一个 HashSet>）中为每个群集使用 HashSet。

- weiyin

[10, 9, 5, 6, 4, 5, 6, 4, ...] 明显不是一个集合。它是一个列表吗？ - Tom Hawtin - tackline

是的，说得对。HashSet 不应该包含重复的对象。这里有些奇怪。 - Skip Head

3个回答

1

在所示的测试中，当您尝试删除包含多个整数的集合时，remove 第一次失败。这种情况总是发生吗？

使用的集合的具体类型是什么？

- Skip Head

是的，你走在正確的軌道上。當我嘗試刪除一個具有多個整數的集合時，它總是第一次失敗。我為每個群集在clusters中使用了HashSet<Integer>（HashSet<HashSet<Integer>>）。 - weiyin

很奇怪。如果你正在使用HashSet，为什么会在集合中获得多个值。 - Tom Hawtin - tackline

就像我之前所说的，HashSet 不应该包含重复的对象。我认为这里有一个更深层次的问题。 - Skip Head

而且显然保持顺序。 - Tom Hawtin - tackline

0

显然的问题在于clusters.remove可能使用equals来查找要删除的元素。不幸的是，集合上的equals通常比较的是元素是否相同，而不是它们是否是相同的集合（我认为C#在这方面做出了更好的选择）。

一个简单的解决方法是将clusters创建为Collections.newSetFromMap(new IdentityHashMap<Collection<Integer>, Boolean>())（我想是这样）。

- Tom Hawtin - tackline

关于 equals 的观点很好，但即使使用 equals，为什么删除操作会失败呢？ - weiyin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tom Hawtin - tackline · Accepted Answer

5

啊。当您更改已经在Set（或Map键）中的值时，它不一定在正确的位置，并且散列码将被缓存。您需要将其删除、修改，然后重新插入。

- Tom Hawtin - tackline

1

没错，你懂了！解决方案是创建一个新的集群，将minclust1和minclust2中的所有元素添加到其中，从clusters中删除minclust1和minclust2，然后添加新的集群。看起来在HashSet中修改对象是个不好的主意。 - weiyin

1

太好了。不变性很棒。从技术上讲，只要不影响其equals和hashCode，您就可以更改HashSet中的元素，但这些应该依赖于所有数据或不依赖于任何数据。如果您有HashSet<java.awt.Component>，则可以毫无顾虑地更改组件。 - Tom Hawtin - tackline