更新:
设计一个数据结构,高效支持add(String word)
、remove(String word)
和List<String> currentTopK(int k)
currentTopK
返回前k个频繁出现的单词,如果有并列,则按字母顺序排列
类似于设计LRU,可以使用双向链表+哈希表来实现
add()
和remove()
时间复杂度:O(1);如果存在并列(多个单词具有相同的计数),则时间复杂度为O(log m),其中m是具有相同计数的单词数量,因为我使用TreeSet来保持字母顺序。如果并列不重要,我可以以任何顺序返回相同计数的单词,那么我就不需要使用TreeSet,我们可以获得O(1)的保证。
currentTopK(int k)
时间复杂度:O(k)
public class TopK {
private class Node {
int count;
TreeSet<Item> itemSet = new TreeSet<>();
Node prev, next;
Node(int count){
this.count = count;
}
}
private class Item implements Comparable<Item> {
String word;
Node countNode;
Item(String w, Node node) {
word = w;
countNode = node;
}
@Override
public int compareTo(Item o) {
return this.word.compareTo(o.word);
}
}
private final Node head = new Node(0);
private Map<String, Item> countMap;
public TopK(){
head.next = head;
head.prev = head;
countMap = new HashMap<>();
}
public void add(String word) {
Item item = countMap.get(word);
Node countNode = item == null? null : item.countNode;
if (countNode == null) {
if (head.next.count == 1) {
countNode = head.next;
} else {
countNode = new Node(1);
insertNode(head, countNode);
}
item = new Item(word, countNode);
countMap.put(word, item);
} else {
Node oldCountNode = countNode;
if (oldCountNode.next.count == oldCountNode.count + 1) {
countNode = oldCountNode.next;
} else {
countNode = new Node(oldCountNode.count + 1);
insertNode(oldCountNode, countNode);
}
oldCountNode.itemSet.remove(item);
if (oldCountNode.itemSet.isEmpty()) removeNode(oldCountNode);
item.countNode = countNode;
}
countNode.itemSet.add(item);
}
public void remove(String word) {
Item item = countMap.get(word);
if (item == null) return;
Node countNode = item.countNode;
if (countNode.count == 1) {
countNode.itemSet.remove(item);
countMap.remove(word);
} else {
Node oldCountNode = countNode;
if (oldCountNode.prev.count == oldCountNode.count - 1) {
countNode = oldCountNode.prev;
} else {
countNode = new Node(oldCountNode.count - 1);
insertNode(oldCountNode.prev, countNode);
}
oldCountNode.itemSet.remove(item);
if (oldCountNode.itemSet.isEmpty()) removeNode(oldCountNode);
item.countNode = countNode;
countNode.itemSet.add(item);
}
}
public List<String> currentTopK(int k){
List<String> res = new ArrayList<>(k);
Node cur = head.prev;
while (cur != head) {
for (Item item : cur.itemSet){
res.add(item.word);
if (res.size() == k) return res;
}
cur = cur.prev;
}
return res;
}
private void insertNode(Node prev, Node cur) {
cur.next = prev.next;
prev.next.prev = cur;
prev.next = cur;
cur.prev = prev;
}
private void removeNode(Node cur) {
Node prev = cur.prev;
Node next = cur.next;
prev.next = next;
next.prev = prev;
cur.prev = null;
cur.next = null;
}
}
旧答案
使用 HashMap + TreeSet
初始化
时间复杂度 O(1)
添加一个新单词:
时间复杂度 O(logk)
输出当前前 K 个单词
时间复杂度 O(k)
处理长度为 N 的数据流:
时间复杂度:O(N log k)
空间复杂度:O(数据流中不同单词的数量) <= O(N)
import java.util.*;
public class TopK {
private final int k;
private Map<String, Integer> counts;
private TreeSet<String> topk;
private Comparator<String> comp;
public TopK(int k) {
this.k = k;
counts = new HashMap<>();
comp = (w1, w2) -> {
int c1 = counts.getOrDefault(w1, 0), c2 = counts.getOrDefault(w2, 0);
return c1 == c2 ? w2.compareTo(w1) : c1 < c2 ? -1 : 1;
};
topk = new TreeSet<>(comp);
}
public void add(String word) {
int newCount = counts.getOrDefault(word, 0) + 1;
if (topk.size() < k) {
topk.remove(word);
counts.put(word, newCount);
topk.add(word);
} else {
if (topk.remove(word)) {
counts.put(word, newCount);
topk.add(word);
} else {
counts.put(word, newCount);
if (comp.compare(word, topk.first()) > 0) {
topk.pollFirst();
topk.add(word);
}
}
}
}
public List<String> currentTopK() {
return new ArrayList<>(topk.descendingSet());
}
}