当前位置：网站首页>Leetcode (347) - top k high frequency elements

Leetcode (347) - top k high frequency elements

2022-07-05 20:19:00 【SmileGuy17】

Leetcode（347）—— front K High frequency elements

subject

Answer key

Method 1 ： Bucket sort

Ideas

seeing the name of a thing one thinks of its function , Bucket sorting means setting up a bucket for each value , Record the number of occurrences of this value in the bucket （ Or other properties ）, Then sort the buckets . For the example , We first get three barrels by sorting them [1,2,3,4], Their values are [4,2,1,1], Indicates the number of times each number appears .
Then , We sort the frequency of barrels , front k A big bucket is the front k A frequent number . Here we can use various sorting algorithms , You can even sort buckets again , Put each old barrel in a different new barrel according to the frequency . For the example , Because the biggest frequency at present is 4, We set up [1,2,3,4] Four new barrels , The old barrels they put into are [[3,4],[2],[],[1]], Indicates the frequency of different numbers . Last , Let's go back and forth , Until I find k Old barrels .

Code implementation

my ：

class Solution {
    
public:
    vector<int> topKFrequent(vector<int>& nums, int k) {
    
        if(nums.size() == 1) return nums;
        unordered_map<int, int> times;
        int maxcount = 0;
        for(auto& it: nums) maxcount = max(maxcount, ++times[it]);
        
        vector<vector<int>> bucket(maxcount+1);
        for(auto& it: times) bucket[it.second].push_back(it.first);
        
        vector<int> ans;
        //  Because the answer is guaranteed to be unique , So don't consider  maxcount  Size 
        while(k > 0){
    
            if(!bucket[maxcount].empty()){
    
                k -= bucket[maxcount].size();
                ans.insert(ans.end(), bucket[maxcount].begin(), bucket[maxcount].end());
            }
            maxcount--;
        }
        return ans;
    }
};

Complexity analysis

Time complexity ： $O (n)$ , among $n$ It's array length
Spatial complexity ： $O (m a x (n, k))$ , among $n$ It's array length

Method 2 ： Heap sort

Ideas

First, traverse the entire array , And use a hash table to record the number of occurrences of each number , And form a 「 Number of occurrences array 」. Find the front of the original array $k$ High frequency elements , It's equivalent to finding out 「 Number of occurrences array 」 Before $k$ Big value .

The simplest thing to do is to give 「 Number of occurrences array 」 Sort . But because there may be $O (N)$ A different number of occurrences （ among $N$ Is the length of the original array ）, Therefore, the overall algorithm complexity will reach $O(N\log N)$ , Does not meet the requirements of the topic .

ad locum , We can use the idea of heap ： Build a small top pile , Then traverse 「 Number of occurrences array 」：

If the number of elements in the heap is less than $k$ , You can insert it directly into the heap .
If the number of elements in the heap is equal to $k$ , Then check the size of the heap top and the current occurrence times . If the top of the pile is larger , It means at least $k$ The number of occurrences is greater than the current value , Therefore, the current value is discarded ; otherwise , Just pop out of the top of the pile , Insert the current value into the heap .

After traversal , The elements in the heap represent 「 Number of occurrences array 」 Middle front $k$ Big value .

Code implementation

Leetcode Official explanation ：

class Solution {
    
public:
    static bool cmp(pair<int, int>& m, pair<int, int>& n) {
    
        return m.second > n.second;
    }

    vector<int> topKFrequent(vector<int>& nums, int k) {
    
        unordered_map<int, int> occurrences;
        for (auto& v : nums) {
    
            occurrences[v]++;
        }

        // pair  The first element of represents the value of the array , The second element represents the number of times the value appears 
        priority_queue<pair<int, int>, vector<pair<int, int>>, decltype(&cmp)> q(cmp);
        for (auto& [num, count] : occurrences) {
    
            if (q.size() == k) {
    
                if (q.top().second < count) {
    
                    q.pop();
                    q.emplace(num, count);
                }
            } else {
    
                q.emplace(num, count);
            }
        }
        vector<int> ret;
        while (!q.empty()) {
    
            ret.emplace_back(q.top().first);
            q.pop();
        }
        return ret;
    }
};

Complexity analysis

Time complexity ： $O(N\log k)$ , among $N$ Is the length of the array . Let's first traverse the original array , And use a hash table to record the number of occurrences , Each element needs $O (1)$ Time for , Common demand $O (N)$ Time for . And then , We traverse 「 Number of occurrences array 」, Because the size of the heap is at most $k$ , Therefore, each heap operation requires $O(\log k)$ Time for , Common demand $O(N\log k)$ Time for . The sum of the two is $O(N\log k)$ .
Spatial complexity ： $O (N)$ . The size of the hash table is $O (N)$ , The size of the heap is $O (k)$ , The total is $O (N)$ .

Method 3 ：（ The improved ） Quick sort —— That is, quick selection and sorting

Ideas

We can use a fast selection algorithm , Find out 「 Number of occurrences array 」 Before $k$ Big value .

First we use $\textit{arr}$ The array stores the number of occurrences corresponding to each number , Then traverse the array to get the number of occurrences . Then on $\textit{arr}$ Array for quick sorting .

On the array $\textit{arr}[l \ldots r]$ In the process of quick sorting , We first divide the array into two parts $\textit{arr}[i \ldots q-1]$ And $\textit{arr}[q+1 \ldots j]$ , And make $\textit{arr}[i \ldots q-1]$ Each value in does not exceed $\textit{arr}[q]$ , And $\textit{arr}[q+1 \ldots j]$ Each value in is greater than $\textit{arr}[q]$ .

therefore , We according to the $k$ And the left sub array $\textit{arr}[i \ldots q-1]$ The length of （ by $q - i q$ ） The size of the relationship ：

If $\le q-i$ , The array $\textit{arr}[l \ldots r]$ front $k$ Big value , Is equal to a subarray $\textit{arr}[i \ldots q-1]$ front $k$ Big value .
otherwise , Array $\textit{arr}[l \ldots r]$ front $k$ Big value , It's equal to all the elements of the left sub array , Add the right sub array $\textit{arr}[q+1 \ldots j]$ Middle front $k - (q - i)$ Big value .

The average time complexity of the original quick sort algorithm is $O(N\log N)$ . In our algorithm , Just recurse on one branch at a time , Therefore, the average time complexity of the algorithm is reduced to $O (N)$ .

Code implementation

Leetcode Official explanation ：

class Solution {
    
public:
    void qsort(vector<pair<int, int>>& v, int start, int end, vector<int>& ret, int k) {
    
        int picked = rand() % (end - start + 1) + start;
        swap(v[picked], v[start]);

        int pivot = v[start].second;
        int index = start;
        for (int i = start + 1; i <= end; i++) {
    
            if (v[i].second >= pivot) {
    
                swap(v[index + 1], v[i]);
                index++;
            }
        }
        swap(v[start], v[index]);

        if (k <= index - start) {
    
            qsort(v, start, index - 1, ret, k);
        } else {
    
            for (int i = start; i <= index; i++) {
    
                ret.push_back(v[i].first);
            }
            if (k > index - start + 1) {
    
                qsort(v, index + 1, end, ret, k - (index - start + 1));
            }
        }
    }

    vector<int> topKFrequent(vector<int>& nums, int k) {
    
        unordered_map<int, int> occurrences;
        for (auto& v: nums) {
    
            occurrences[v]++;
        }

        vector<pair<int, int>> values;
        for (auto& kv: occurrences) {
    
            values.push_back(kv);
        }
        vector<int> ret;
        qsort(values, 0, values.size() - 1, ret, k);
        return ret;
    }
};

Complexity analysis

Time complexity ：
among $N$ Is the length of the array . Set the processing length to $N$ The time complexity of the array is $f (N)$ . Because the process of processing includes one traversal and one recursion of sub branches , At best , Yes $f (N) = O (N) + f (N / 2)$ , According to the main theorem , Can get $f (N) = O (N)$ .
In the worst case , Each pivot is located at both ends of the array , Time complexity degenerates to $O(N^2)$ . But because we randomly select the central element at the beginning of each recursion , So the probability of the worst case is very low .
On average , The time complexity is $O (N)$ .

Spatial complexity ： $O (N)$ . The size of hash table is $O (N)$ , The size of the auxiliary array used for sorting is also $O (N)$ , The best space complexity of quick sort is $O(\log N)$ , The worst case scenario is $O (N)$