Java 8 - 统计单词数并按降序排列

Question

Java 8 - 统计单词数并按降序排列

4

我有一个单词列表：

List<String> words = Arrays.asList("Hello alan i am here where are you"+  
  "and what are you doing hello are you there");

如何获取在列表中重复超过一次的前七个单词，并按降序排列？然后，单个单词应按字母顺序排列。因此，以上内容的输出应为这些前七个单词。

you (3)
are (2)
hello (2)
alan (1)
am (1)
and (1)
doing (1)

我希望你能使用Java 8中的流和lambda表达式来完成这个任务。

我尝试了以下方法：首先对列表进行排序其次获取单词的映射及其在单词列表中的数量。

List<String> sortedWords = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "))
            .stream().sorted().collect(toList());

Map<String, Long> collect = 
            sortedWords.stream().collect(groupingBy(Function.identity(), counting()));

- bhupen

5个回答

4

尽管@Tunaki的解决方案很好，但有趣的是使用我的StreamEx库，可以在单个流管道中解决问题（直到调用单个终端操作之前不会执行任何实际操作）：

Map<String, Long> map = StreamEx.of(words)
    .map(String::toLowerCase)
    .sorted() // sort original words, so now repeating words are next to each other
    .runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
    .sorted(Entry.<String, Long> comparingByValue().reversed()
                 .thenComparing(Entry.comparingByKey()))
    .limit(7) // Sort and limit
    .toCustomMap(LinkedHashMap::new); // Single terminal operation: store to LinkedHashMap

或者如果只需要文字：

List<String> list =StreamEx.of(words)
    .map(String::toLowerCase)
    .sorted() // sort original words, so now repeating words are next to each other
    .runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
    .sorted(Entry.<String, Long> comparingByValue().reversed()
                 .thenComparing(Entry.comparingByKey()))
    .limit(7) // Sort and limit
    .keys() // Drop counts leaving only words
    .toList(); // Single terminal operation: store to List

- Tagir Valeev

2

我越读你的回答，就越相信你的StreamEx库应该是API中首要的！ - Tunaki

@Tagir，太棒了，我会检查你的StreamEx。 - bhupen

2

有时解决问题的最佳方案不是算法，而是数据结构。我想你在这里需要一个"Bag"。由于你希望输出按出现次数和键值排序，因此你应该使用特定的数据结构——TreeBag。以下代码将使用Java 8 Streams和 Eclipse Collections 运行：

String string =
    "Hello alan i am here where are you and what are you doing hello are you there";
List<ObjectIntPair<String>> pairs =
    Stream.of(string.toLowerCase().split(" "))
        .collect(Collectors.toCollection(TreeBag::new))
        .topOccurrences(7);
System.out.println(pairs);

这段代码将输出：

// Strings with occurrences
[are:3, you:3, hello:2, alan:1, am:1, and:1, doing:1, here:1, i:1, there:1, what:1, where:1]

“topOccurrences()”方法具有处理并列情况的逻辑，基本上让开发人员决定如何处理这些情况。如果你想要从该列表中精确获取前七个项目，可以链式调用“.take(7)”方法。

代码也可以进一步简化为：

List<ObjectIntPair<String>> pairs =
    TreeBag.newBagWith(string.split(" ")).topOccurrences(7);
System.out.println(pairs);

静态工厂方法TreeBag.newBagWith()接受可变参数，因此您可以直接将String.split()的结果传递给它。

注意：我是Eclipse Collections的提交者。

- Donald Raab

1

我很简单，所以我会使用 Map<String, Integer> 来首先计算每个单词的数量。然后为每个计数创建一个 TreeSet，并将它们存储在一个 TreeMap<Integer, TreeSet> 中。从那里开始应该相当直接。

- Robert Benson

0

两步解决方案：先分组/计数，然后按计数降序处理。

List<String> words = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "));

Map<String, Long> collect = words.stream()
        .map(String::toLowerCase) // convert to lower case
        .collect( // group and count by name
                Collectors.groupingBy(Function.identity(), Collectors.counting()));

collect.keySet().stream()
        .sorted( // order by count descending, then by name
                Comparator
                        .comparing(collect::get)
                        .reversed()
                        .thenComparing(Collator.getInstance()))
        .map(k -> k + " (" + collect.get(k) + ")") // map to name and count string
        .limit(7) // only first 7 entries
        .forEach(System.out::println); // output

- Peter Walser

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tunaki · Accepted Answer

最困难的部分是排序。由于您只想保留结果中的前7个元素，并且希望按其值对Map进行排序，因此我们需要创建所有结果的Map，对其进行排序，然后保留7个结果。

在以下代码中，每个单词都转换为小写并按自身分组，计算出现次数。然后，我们需要对此映射进行排序，因此创建一个Stream以处理这些条目，根据值（按降序）和键进行排序。保留前7个元素，将它们映射到它们的键（即对应单词），并收集到一个List中，从而保持遇到的顺序。

public static void main(String[] args) {
    String sentence = "Hello alan i am here where are you and what are you doing hello are you there";
    List<String> words = Arrays.asList(sentence.split(" "));

    List<String> result = 
            words.stream()
                 .map(String::toLowerCase)
                 .collect(groupingBy(identity(), counting()))
                 .entrySet().stream()
                 .sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder()).thenComparing(Map.Entry.comparingByKey()))
                 .limit(7)
                 .map(Map.Entry::getKey)
                 .collect(toList());

    System.out.println(result);
}

输出：

[are, you, hello, alan, am, and, doing]

请注意，您在期望的输出中犯了一个错误：< code >“are”实际上像< code >“you”一样出现了3次，因此应该出现在前面。 注意：此代码假定有许多静态导入，包括：

import static java.util.Comparator.reverseOrder;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toList;