Java 8 并行流并发分组

Question

Java 8 并行流并发分组

7

Suppose I have a class as

Class Person {
  String name;
  String uid;
  String phone;
}

我希望将类的所有字段进行分组。如何在JAVA 8中使用并行流来转换它？

List<Person> into Map<String,Set<Person>>

在JAVA 8中，下面的示例将按单个字段分组，其中映射的键是类中每个字段的值。如何将一个类的所有字段按照单个映射分组？

ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
    .parallelStream()
    .collect(
        Collectors.groupingByConcurrent(Person::getGender));

- user3665053

2个回答

5

你可以使用Collector中的of静态工厂方法来实现这一点：

Map<String, Set<Person>> groupBy = persons.parallelStream()
    .collect(Collector.of(
        ConcurrentHashMap::new,
        ( map, person ) -> {
            map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
        },
        ( a, b ) -> {
            b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
            return a;
        }
    ));

如评论中的Holger所建议，以下方法可能优于上述方法:

Map<String, Set<Person>> groupBy = persons.parallelStream()
     .collect(HashMap::new, (m, p) -> { 
         m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); 
     }, (a, b) -> b.forEach((key, set) -> {
         a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
     });

它使用了重载的collect方法，其作用与我上面建议的语句完全相同。

- Lino

2

这里不需要使用 ConcurrentHashMap，普通的 HashMap 就可以。只有在指定了 CONCURRENT 特性时才需要使用 ConcurrentHashMap。顺便说一下，你可以通过使用 collect 的三个参数版本来进一步简化解决方案：

persons.parallelStream() .collect(HashMap::new, (m, p) -> { m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); }, (a, b) -> b.forEach((key, set) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set)));

- Holger

@Holger，我使用了并发哈希映射，因为OP提到了并行。但是我忘记添加特征了。我还编辑了你的部分并加入了答案。 - Lino

1

没有CONCURRENT特性的Collector仍然可以与并行流一起使用。在这种情况下，Stream实现会负责适当地使用函数；这就是为什么它可以与任意集合和映射一起使用，而不需要ConcurrentHashMap。请参见此问答以了解差异的讨论。 - Holger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ousmane D. · Accepted Answer

您可以通过链接分组收集器来获得一个多级映射。但是，如果您想按超过2个字段进行分组，则这并不理想。

更好的选择是在您的Person类中重写equals和hashcode方法，以定义两个给定对象的相等性，这种情况下将是所有所述字段。然后，您可以按Person分组，即groupingByConcurrent(Function.identity())，在这种情况下，您将得到：

ConcurrentMap<Person, List<Person>> resultSet = ....

例子：

class Person {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (name != null ? !name.equals(person.name) : person.name != null) return false;
        if (uid != null ? !uid.equals(person.uid) : person.uid != null) return false;
        return phone != null ? phone.equals(person.phone) : person.phone == null;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result + (uid != null ? uid.hashCode() : 0);
        result = 31 * result + (phone != null ? phone.hashCode() : 0);
        return result;
    }

    private String name;
    private String uid; // these should be private, don't expose
    private String phone;

   // getters where necessary
   // setters where necessary
}

然后：

ConcurrentMap<Person, List<Person>> resultSet = list.parallelStream()
                .collect(Collectors.groupingByConcurrent(Function.identity()));