如何在Java中从列表中选择重复的值？

Question

如何在Java中从列表中选择重复的值？

javalistduplicatesunique

9

例如，我的列表包含{4, 6, 6, 7, 7, 8}，我想要的最终结果是{6, 6, 7, 7}。

一种方法是遍历列表并消除唯一值（在这种情况下为4、8）。

除了遍历列表之外，是否还有其他有效的方法？我之所以问这个问题，是因为我正在处理的列表非常大？

我的代码是：

List<Long> duplicate = new ArrayList();
for (int i = 0; i < list.size(); i++) {
     Long item = (Long) list.get(i);
     if (!duplicate.contains(item)) {
          duplicate.add(item);
         }
     }

- Script_Junkie

如果你想找到所有的重复项，就必须至少循环一次整个列表。如果你必须比较列表中的每个值，那么使用列表没有更高效的方法。要使其更加高效，解决方案在于创建列表时。 - Java Devil

你至少需要一个循环。如果你想要更有效率的代码（虽然不是所有情况都保证如此），你可以先对列表进行排序，然后检查“相邻”元素是否不同（如果是，那么这个元素就是唯一的，从列表中删除即可）。 - morgano

如果你不想使用循环，你可以随时打印出列表并计算重复项。 - Tdorno

你知道你的代码并没有按照你在问题中所要求的那样工作吗？ - jarnbjo

13个回答

6

除了遍历列表，还有其他有效的方法吗？

你可以雇佣魔法精灵为你完成。如果不遍历列表，你甚至无法查看其中的元素。这就好比你想把许多数字相加而不查看这些数字一样。汇总元素要比搜索重复项或搜索唯一元素容易得多。通常，代码的 97% 都是遍历列表和数据并对其进行处理和更新。

所以说，你必须循环。现在你可能想选择最有效的方法。以下是一些方法：

排序所有数字，然后仅通过一次循环找到重复项（因为它们将彼此相邻）。但是，请记住，排序算法也会遍历数据。
对于列表中的每个元素，请检查是否存在具有相同值的另一个元素。（这就是您所做的。这意味着您在每个内部循环中都有两个循环。（contains当然会遍历列表。））

- Martijn Courteaux

4

我喜欢这个回答：Java 8流查找重复元素。该解决方案仅返回唯一的重复元素。

 Integer[] numbers = new Integer[] { 1, 2, 1, 3, 4, 4 };
 Set<Integer> allItems = new HashSet<>();
 Set<Integer> duplicates = Arrays.stream(numbers)
    .filter(n -> !allItems.add(n)) //Set.add() returns false if the item was already in the set.
    .collect(Collectors.toSet());
 System.out.println(duplicates); // [1, 4]

- Grigory Kislin

1

这并没有回答问题，他想要所有重复的数字。在你的例子中应该是：[1,1,4,4]。 - karlihnos

4

List<Number> inputList = Arrays.asList(4, 6, 6, 7, 7, 8);
List<Number> result = new ArrayList<Number>();
for(Number num : inputList) {
   if(Collections.frequency(inputList, num) > 1) {
       result.add(num);
   }
}

我不确定效率如何，但是我发现代码易于阅读（这应该是首选）。

编辑：将 Lists.newArrayList() 更改为 new ArrayList<Number>();

- Jiri Kremser

我猜你在这里使用了一些第三方库... (Lists.newArrayList())？但是你可以直接使用 new ArrayList<>()。 - Puce

@Junaid 我知道。我指的是 Lists.newArrayList()。 - Puce

1

这实际上是使用Guava和那些过滤器和谓词的好任务。 - Jiri Kremser

我评论后意识到了。对不起。;) - JHS

当然，可读性非常重要，但它具有二次复杂度，这太糟糕了。作为Guava用户，您可以享受Multiset就像我一样。如果没有Guava，则可能需要多写一行代码来使用HashMap。 - maaartinus

1

有一个

Map<Integer, Integer> numberToOccurance = new HashMap<Integer, Integer>();

保持计数和数量，在结尾迭代键集并获取具有超过一个计数的值。

- jmj

或者使用TreeMap，如果你想要对数字进行排序。 - Puce

为什么我们要排序的开销，哈希更快！ - jmj

好的，只有在原帖作者想要数字排序时才需要进行排序。在示例中，数字已经被排序了。 - Puce

@Jigar，如果我使用树或哈希映射，我仍然需要迭代。我正在处理的实际数据是一系列大型加密文本项。 - Script_Junkie

如果由于某种原因结果应该排序（尽管没有人提到过），那么将结果列表排序会比使用TreeMap计算出现次数更快，以便稍后可能获得的好处是结果已经排序。 - jarnbjo

0

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class FindDuplicate {

    public static void main(String[] args) {

        // Load all your ArrayList
        List<String> list = new ArrayList<String>();
        list.add("Jhon");
        list.add("Jency");
        list.add("Mike");
        list.add("Dmitri");
        list.add("Mike");

        // Set will not allow duplicates
        Set<String> checkDuplicates = new HashSet<String>();

        System.out.println("Actual list " + list);
        for (int i = 0; i < list.size(); i++) {
            String items = list.get(i);
            if (!checkDuplicates.add(items)) {
                // retain the item from set interface
                System.out.println("Duplicate in that list " + items);
            }
        }

    }
}

- Dhamu

0

使用Guava和Java 8，这是微不足道且快速的：

Multiset<Integer> multiset = HashMultiset.create(list);
return list.stream()
    .filter(i -> multiset.count(i) > 1)
    .collect(Collectors.toList());

第一行使用哈希映射计算计数。其余部分非常明显。

类似这样的东西可以模拟多重集合：

HashMap<Integer, Integer> multiset = new HashMap<>();
list.stream().forEach(i -> 
    multiset.compute(i, (ignored, old) -> old==null ? 1 : old+1)));

- maaartinus

0

再次展现lambda的威力：

List<Long> duplicates = duplicate.stream()
  .collect( Collectors.collectingAndThen( Collectors.groupingBy( Function.identity() ),
    map -> {
      map.values().removeIf( v -> v.size() < 2 );  // eliminate unique values (4, 8 in this case)
      return( map.values().stream().flatMap( List::stream ).collect( Collectors.toList() ) );
    } ) );  // [6, 6, 7, 7]

以上解决方案的速度优化版本：

List<Long> duplicates = duplicate.stream().collect( Collectors.collectingAndThen(
    Collectors.groupingBy( Function.identity(), Collectors.counting() ),
    map -> {
      map.values().removeIf( v -> v < 2 );  // eliminate unique values (4, 8 in this case)
      return( map.entrySet().stream().collect( Collector.of( ArrayList<Long>::new, (list, e) -> {
        for( long n = 0; n < e.getValue(); n++ )
          list.add( e.getKey() );
      }, (l1, l2) -> null ) ) );
    } ) );  // [6, 6, 7, 7]

duplicate 的长整数值未被保存，但被计算 —— 这很可能是最快速和最节省空间的变体。

- Kaplan

0

试试这个：

受这个答案的启发：https://stackoverflow.com/a/41262509/11256849

for (String s : yourList){
     if (indexOfNth(yourList, s, 2) != -1){
         Log.d(TAG, s);
      }
   }

使用这种方法：

public static <T> int indexOfNth(ArrayList list, T find, int nthOccurrence) {
        if (list == null || list.isEmpty()) return -1;
        int hitCount = 0;
        for (int index = 0; index < list.size(); index++) {
            if (list.get(index).equals(find)) {
                hitCount++;
            }
            if (hitCount == nthOccurrence) return index;
        }
        return -1;
    }

- Creesch 2.0

0

你的List最好是一个不允许重复的Set。作为循环的替代方案，你可以将其转换并切换到Set，或者中间使用它来消除重复项，具体如下：

List<Long> dupesList = Arrays.asList(4L, 6L, 6L, 7L, 7L, 8L);

Set<Long> noDupesSet = new HashSet<Long>(dupesList);
System.out.println(noDupesSet); // prints: [4, 6, 7, 8]

// To convert back to List
Long[] noDupesArr = noDupesSet.toArray(new Long[noDupesSet.size()]);
List<Long> noDupesList = Arrays.asList(noDupesArr);
System.out.println(noDupesList); // prints: [4, 6, 7, 8]

- Ravi K Thapliyal

这并没有回答问题，他想要所有重复的数字。在你的例子中应该是：[6,6,7,7]。 - karlihnos

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- wobblycogs · Accepted Answer

到目前为止有一些不错的答案，但是还有另一个选项，只是为了好玩。循环遍历列表，尝试将每个数字放入一个集合中，例如HashSet。如果add方法返回false，则知道该数字是重复的，并应该放入重复列表中。

编辑：像这样做应该可以：

Set<Number> unique = new HashSet<>();
List<Number> duplicates = new ArrayList<>();
for( Number n : inputList ) {
    if( !unique.add( n ) ) {
        duplicates.add( n );
    }
}