-
Preface
- FAQ
-
Part I - Basics
- Basics Data Structure
- Basics Sorting
- Basics Algorithm
- Basics Misc
-
Part II - Coding
- String
-
Integer Array
-
Remove Element
-
Zero Sum Subarray
-
Subarray Sum K
-
Subarray Sum Closest
-
Recover Rotated Sorted Array
-
Product of Array Exclude Itself
-
Partition Array
-
First Missing Positive
-
2 Sum
-
3 Sum
-
3 Sum Closest
-
Remove Duplicates from Sorted Array
-
Remove Duplicates from Sorted Array II
-
Merge Sorted Array
-
Merge Sorted Array II
-
Median
-
Partition Array by Odd and Even
-
Kth Largest Element
-
Remove Element
-
Binary Search
-
First Position of Target
-
Search Insert Position
-
Search for a Range
-
First Bad Version
-
Search a 2D Matrix
-
Search a 2D Matrix II
-
Find Peak Element
-
Search in Rotated Sorted Array
-
Search in Rotated Sorted Array II
-
Find Minimum in Rotated Sorted Array
-
Find Minimum in Rotated Sorted Array II
-
Median of two Sorted Arrays
-
Sqrt x
-
Wood Cut
-
First Position of Target
-
Math and Bit Manipulation
-
Single Number
-
Single Number II
-
Single Number III
-
O1 Check Power of 2
-
Convert Integer A to Integer B
-
Factorial Trailing Zeroes
-
Unique Binary Search Trees
-
Update Bits
-
Fast Power
-
Hash Function
-
Happy Number
-
Count 1 in Binary
-
Fibonacci
-
A plus B Problem
-
Print Numbers by Recursion
-
Majority Number
-
Majority Number II
-
Majority Number III
-
Digit Counts
-
Ugly Number
-
Plus One
-
Palindrome Number
-
Task Scheduler
-
Single Number
-
Linked List
-
Remove Duplicates from Sorted List
-
Remove Duplicates from Sorted List II
-
Remove Duplicates from Unsorted List
-
Partition List
-
Add Two Numbers
-
Two Lists Sum Advanced
-
Remove Nth Node From End of List
-
Linked List Cycle
-
Linked List Cycle II
-
Reverse Linked List
-
Reverse Linked List II
-
Merge Two Sorted Lists
-
Merge k Sorted Lists
-
Reorder List
-
Copy List with Random Pointer
-
Sort List
-
Insertion Sort List
-
Palindrome Linked List
-
LRU Cache
-
Rotate List
-
Swap Nodes in Pairs
-
Remove Linked List Elements
-
Remove Duplicates from Sorted List
-
Binary Tree
-
Binary Tree Preorder Traversal
-
Binary Tree Inorder Traversal
-
Binary Tree Postorder Traversal
-
Binary Tree Level Order Traversal
-
Binary Tree Level Order Traversal II
-
Maximum Depth of Binary Tree
-
Balanced Binary Tree
-
Binary Tree Maximum Path Sum
-
Lowest Common Ancestor
-
Invert Binary Tree
-
Diameter of a Binary Tree
-
Construct Binary Tree from Preorder and Inorder Traversal
-
Construct Binary Tree from Inorder and Postorder Traversal
-
Subtree
-
Binary Tree Zigzag Level Order Traversal
-
Binary Tree Serialization
-
Binary Tree Preorder Traversal
- Binary Search Tree
- Exhaustive Search
-
Dynamic Programming
-
Triangle
-
Backpack
-
Backpack II
-
Minimum Path Sum
-
Unique Paths
-
Unique Paths II
-
Climbing Stairs
-
Jump Game
-
Word Break
-
Longest Increasing Subsequence
-
Palindrome Partitioning II
-
Longest Common Subsequence
-
Edit Distance
-
Jump Game II
-
Best Time to Buy and Sell Stock
-
Best Time to Buy and Sell Stock II
-
Best Time to Buy and Sell Stock III
-
Best Time to Buy and Sell Stock IV
-
Distinct Subsequences
-
Interleaving String
-
Maximum Subarray
-
Maximum Subarray II
-
Longest Increasing Continuous subsequence
-
Longest Increasing Continuous subsequence II
-
Maximal Square
-
Triangle
- Graph
- Data Structure
- Big Data
- Problem Misc
-
Part III - Contest
- Google APAC
- Microsoft
- Appendix I Interview and Resume
-
Tags
Top K Frequent Words
Problem
Metadata
- tags: Pocket Gems, Hash Table, Amazon, Priority Queue, Bloomberg, Yelp, Heap, Uber, EditorsChoice
- difficulty: Medium
- source(lintcode): https://www.lintcode.com/problem/top-k-frequent-words/
- source(leetcode): https://leetcode.com/problems/top-k-frequent-words/
Description
Given a list of words and an integer k, return the top k frequent words in the list.
Notice
You should order the words by the frequency of them in the return list, the most frequent one comes first. If two words has the same frequency, the one with lower alphabetical order come first.
Example
Given
[
"yes", "lint", "code",
"yes", "code", "baby",
"you", "baby", "chrome",
"safari", "lint", "code",
"body", "lint", "code"
]
copy
for k = 3
, return ["code", "lint", "baby"]
.
for k = 4
, return ["code", "lint", "baby", "yes"]
,
Challenge
Do it in O(nlogk) time and O(n) extra space.
题解
输出出现频率最高的 K 个单词并对相同频率的单词按照字典序排列。如果我们使用大根堆维护,那么我们可以在输出结果时依次移除根节点即可。这种方法虽然可行,但不可避免会产生不少空间浪费,理想情况下,我们仅需要维护 K 个大小的堆即可。所以接下来的问题便是我们怎么更好地维护这种 K 大小的堆,并且在新增元素时剔除的是最末尾(最小)的节点。
Java
public class Solution {
/**
* @param words: an array of string
* @param k: An integer
* @return: an array of string
*/
public String[] topKFrequentWords(String[] words, int k) {
// write your code here
if (words == null || words.length == 0) return words;
if (k <= 0) return new String[0];
Map<String, Integer> wordFreq = new HashMap<>();
for (String word : words) {
wordFreq.putIfAbsent(word, 0);
wordFreq.put(word, wordFreq.get(word) + 1);
}
PriorityQueue<KeyFreq> pq = new PriorityQueue<KeyFreq>(k);
for (Map.Entry<String, Integer> entry : wordFreq.entrySet()) {
KeyFreq kf = new KeyFreq(entry.getKey(), entry.getValue());
if (pq.size() < k) {
pq.offer(kf);
} else {
KeyFreq peek = pq.peek();
if (peek.compareTo(kf) <= 0) {
pq.poll();
pq.offer(kf);
}
}
}
int topKSize = Math.min(k, pq.size());
String[] topK = new String[topKSize];
for (int i = 0; i < k && !pq.isEmpty(); i++) {
topK[i] = pq.poll().key;
}
// reverse array
for (int i = 0, j = topKSize - 1; i < j; i++, j--) {
String temp = topK[i];
topK[i] = topK[j];
topK[j] = temp;
}
return topK;
}
class KeyFreq implements Comparable<KeyFreq> {
String key;
int freq;
public KeyFreq(String key, int freq) {
this.key = key;
this.freq = freq;
}
@Override
public int compareTo(KeyFreq kf) {
if (this.freq != kf.freq) {
return this.freq - kf.freq;
}
return kf.key.compareTo(this.key);
}
}
}
copy
源码分析
使用 Java 自带的 PriorityQueue 来实现堆,由于需要定制大小比较,所以这里自定义类中实现了 Comparable
的 compareTo
接口,另外需要注意的是这里原生使用了小根堆,所以我们在覆写 compareTo
时需要注意字符串的比较,相同频率的按照字典序排序,即优先保留字典序较小的字符串,所以正好和 freq 的比较相反。最后再输出答案时,由于是小根堆,所以还需要再转置一次。此题的 Java 实现中,使用的 PriorityQueue 并非线程安全,实际使用中需要注意是否需要用到线程安全的 PriorityBlockingQueue
对于 Java, 虽然标准库中暂未有定长的 PriorityQueue 实现,但是我们常用的 Google guava 库中其实已有类似实现,见 MinMaxPriorityQueue 不必再自己造轮子了。
复杂度分析
堆的插入删除操作,定长为 K, n 个元素,故时间复杂度约 , 空间复杂度为 .