使用Java 8根据属性从对象列表中删除重复项

Question

使用Java 8根据属性从对象列表中删除重复项

120

我想基于某个属性从对象列表中删除重复的元素，是否可以使用Java 8简单地实现？

List<Employee> employee

我们能否根据员工的id属性从中删除重复项？我看过一些帖子，可以从字符串数组列表中删除重复的字符串。

- Patan

10

为什么你用列表来做那件事......使用集合代替列表。 - Ranjeet

你想要搜索员工姓名的重复项吗？还是你的目的是什么，请提供更多信息。 - Dude

1

@Ranjeet 只有在 Employee 正确实现 equals 和 hashCode 以正确识别重复项的情况下，才能起作用。 - Madbreaks

优秀的答案 https://howtodoinjava.com/java8/java-stream-distinct-examples/ - Dusman

9个回答

104

在列表中直接完成它的最简单方法是

HashSet<Object> seen = new HashSet<>();
employee.removeIf(e -> !seen.add(e.getID()));

removeIf 将删除满足指定条件的元素
Set.add 如果未修改 Set，即已包含该值，则返回 false
将这两个结合起来，它将删除所有已经出现过 ID 的元素 (员工)

当然，这只适用于支持元素的删除操作的列表。

- Holger

你假设id是唯一的，如果我有复合键怎么办？ - Kamil Nękanowicz

3

你需要一个对象来保存复合键并具有适当的equals和hashCode实现，例如：yourList.removeIf(e -> !seen.add(Arrays.asList(e.getFirstKeyPart(), e.getSecondKeyPart())));通过Arrays.asList组成的键可以处理任意数量的组件，而对于少量组件，专用的键类型可能更有效。 - Holger

你说的“全部”是什么意思？我至少需要留下一个。 - user25

4

好的回答！这对我比被接受的回答更有效，尽管两者都很好！ - OmNiOwNeR

非常好的解决方案，适用于我。 - VXp

64

如果您可以使用 equals，则可以使用流内的 distinct 进行列表过滤（请参见上面的答案）。如果您无法或不想覆盖 equals 方法，则可以按以下方式对流进行 filter 以获取任何属性，例如名称属性（Id 属性相同）：

Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
            .filter(e -> nameSet.add(e.getName()))
            .collect(Collectors.toList());

- Rolch2015

1

这很不错，它利用了过滤器的简单功能，根据谓词（应用于每个元素以确定是否应包含该元素的谓词）来过滤或保留每个元素，基于属性（字符串类型）插入到集合中：如果是新插入的，则为true，如果已存在则为false...这真是太聪明了！对我非常有效！ - Shessuky

1

这个例子很好也很简单。 - Nathani Software

在多线程场景/并行流中，它能正常工作吗？我的意思是，它是线程安全的东西吗？ - Arun Gowda

这是一个不错的解决方案，可以从列表中删除重复的项。但我的问题是获取两个列表中ID不相同的项。 - Masi Boo

哇！很棒又简单。 - logbasex

23

另一个解决方案是使用Predicate，然后您可以在任何过滤器中使用它：

public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
  Set<Object> objects = new ConcurrentHashSet<>();
  return t -> objects.add(f.apply(t));
}

然后只需在任何地方重用谓词即可：

employees.stream().filter(distinctBy(e -> e.getId));

注意：在filter的JavaDoc中，它说它需要一个无状态的Predicate。实际上，即使流是并行的，这也可以正常工作。

关于其他解决方案：

1）使用.collect(Collectors.toConcurrentMap(..)).values()是一个很好的解决方案，但如果您想排序并保持顺序，这将很麻烦。

2）stream.removeIf(e->!seen.add(e.getID()))也是另一个非常好的解决方案。但我们需要确保集合实现了removeIf，例如如果我们使用Arrays.asList(..)构造集合，它会抛出异常。

- navins

非常好的解决方案，当您无法覆盖equals方法并且不想在lambda中使用Set/List转换时，可以使用它，就像接受的答案一样。谢谢！ - Ramy Arbid

4

我不确定为什么这个功能没有被加入到Java 8库中。如果能够使用它，比如stream().distinctBy(Employee::Id)，将会非常方便。 - Arun Gowda

f 可能为 null 并抛出 NullPointer 异常。 - Zon

3

不错！如果你没有ConcurrentHashSet，你可以将new ConcurrentHashSet更改为ConcurrentHashMap.newKeySet()。 - KeKru

令人惊叹的解决方案，真可惜它还没有进入JDK。 - Marian Klühspies

18

尝试这段代码：

Collection<Employee> nonDuplicatedEmployees = employees.stream()
   .<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
   .values();

- Tho

13

这对我有用：

list.stream().distinct().collect(Collectors.toList());

当然，您需要实现equals方法。

- Sebastian D'Agostino

你需要实现equals方法，当然。 - Sebastian D'Agostino

1

@Andronicus 我在我的回复中添加了我的评论。 - Sebastian D'Agostino

hashcode()也应该被重写，但根据Stream API的distinct()方法 https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#distinct--，只有equals应该被重写。 - jhenya-d

11

如果顺序不重要并且并行运行效果更好，可以使用Collectors.toMap()方法将结果收集到Map中，然后获取值：

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()

- Xiao Liu

2

那么，如果你需要一个列表返回，可以像这样做：

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values().stream().collect(Collectors.toList())

。至于并行，你可以在这里使用或不使用 - 我的意思是parallelStream API？ - Rok T.

1

@RokT，无需重新创建流，只需将其包装在ArrayList中即可。例如：new ArrayList<>(.stream().collect()......values()); - Tharindu Eranga

2

这里有很多好的答案，但我没有找到关于使用reduce方法的答案。所以针对您的情况，您可以按照以下方式应用它：

 List<Employee> employeeList = employees.stream()
      .reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
      {
        if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
        {
          accumulator.add(employee);
        }
        return accumulator;
      }, (acc1, acc2) ->
      {
        acc1.addAll(acc2);
        return acc1;
      });

- Alex

在使用并行流处理时，有可能组合器会再次将具有相同ID的员工相加.. 在这种情况下，您还需要检查是否存在重复项。 - Sven Dhaens

0

另一个简单版本

BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;

TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);

- zawhtut

4

这是一段混淆过的代码：TreeSet<Employee> outputList = new TreeSet<>(Comparator.comparing(p->p.getId())); outputList.addAll(personList); 直接的代码要简单得多。 - Holger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alexis C. · Accepted Answer

你可以从List中获取流并将其放入TreeSet中，然后提供一个自定义比较器来唯一比较id。如果你确实需要一个列表，你可以将这个集合放回到ArrayList中。

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

给定以下示例：

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

它将输出：

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]

另一个想法是使用一个包装器来包装员工，并基于其ID设置equals和hashcode方法：

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

然后，您将每个实例包装起来，调用distinct()，将它们解包并将结果收集到列表中。

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());

事实上，我认为你可以通过提供一个比较函数使这个包装器通用：

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

映射将会是：

.map(e -> new Wrapper<>(e, Employee::getId))