当前位置:网站首页>After SQL group query, get the first n records of each group

After SQL group query, get the first n records of each group

2020-11-09 10:51:00 osc A 3gbrdm

Catalog

One 、 background

Two 、 Practical analysis

3、 ... and 、 summary


One 、 background

lately , There is a functional requirement in development . The system has an information query module , Ask for information to be presented in card form . Here's the picture :

Show the cards according to the project team , Each project group shows the most read TOP2.

Demand analysis : Group by project group , Then take the top of each group with the most reading 2 strip .

Two 、 Practical analysis

be based on Mysql database

The table definition

1、 The project team :team

id

Primary key

name

Project team name

2、 Information sheet :info

id

Primary key

team_id

The project team id

title

Information name

pageviews

Browse volume

content

Information content

info Table data is shown in the figure below

Let's preview Select Basic knowledge of

Writing order :

select *columns* 
from *tables* 
where *predicae1* 
group by *columns* 
having *predicae1* 
order by *columns* 
limit *start*, *offset*;

Execution order :

from *tables* 
where *predicae1* 
group by *columns* 
having *predicae1* 
select *columns* 
order by *columns* 
limit *start*, *offset*;

 

count( Field name ) # Returns the total number of records in this field in the table

DISTINCT Field name # Filter duplicate records in the field

 

First step : First find out the top two readings in the information sheet

info Information table self correlation

SELECT a.* 
  FROM info a 
WHERE (
        SELECT count(DISTINCT b.pageviews) 
              FROM info b 
                   WHERE a.pageviews < b.pageviews AND a.team_id= b.team_id
      ) < 2 ;

At first glance, it's hard to understand , Here's an example

for instance :

When the amount of reading pageviews a = b = [1,2,3,4]

a.pageviews = 1,b.pageviews  It can take  [2,3,4],count(DISTINCT b.pageviews) = 3 
a.pageviews = 2,b.pageviews  It can take  [3,4],count(DISTINCT b.pageviews) = 2 #  Yes 2 strip , That's the third place  
a.pageviews = 3,b.pageviews  It can take  [4],count(DISTINCT b.pageviews) = 1 #  Yes 1 strip , That's the second place  
a.pageviews = 4,b.pageviews  It can take  [],count(DISTINCT b.pageviews) = 0 #  Yes 0 strip , That is, the biggest   The first name 

count(DISTINCT b.pageviews) Represents several values larger than this value

a.team_id= b.team_id Autocorrelation condition , It's about equal to grouping

therefore Top two Equivalent to count(DISTINCT e2.Salary) < 2 , therefore a.pageviews It can be taken as 3、4, Before the assembly 2 high

The second step : Put the watch again team And table info Connect

SELECT a.id, t.NAME, a.team_id, a.pageviews 
  FROM info a 
    LEFT JOIN team t ON a.team_id = t.id 
WHERE (
        SELECT count(DISTINCT b.pageviews) 
               FROM info b 
                 WHERE a.pageviews < b.pageviews AND a.team_id= b.team_id) < 2 
ORDER BY a.team_id, a.pageviews desc

The results are as follows :

There is also a way to understand

grouping GROUP BY + HAVING, This method can be used to debug the results step by step

SELECT a.id, t.NAME, a.team_id, a.pageviews, COUNT( DISTINCT b.pageviews ) 
  FROM info a 
    LEFT JOIN info b ON ( a.pageviews < b.pageviews AND a.team_id = b.team_id ) 
    LEFT JOIN team t ON a.team_id = t.id 
GROUP BY a.id, t.NAME, a.team_id, a.pageviews 
HAVING COUNT( DISTINCT b.pageviews ) < 2 
ORDER BY a.team_id, a.pageviews DESC

problem : If the number of readings is the same , It just cracked .

Illustrate with examples :

When the amount of reading pageviews a = b = [1,2,2,4]

a.pageviews = 1,b.pageviews  It can take  [2,2,4],count(DISTINCT b.pageviews) = 3 
a.pageviews = 2,b.pageviews  It can take  [4],count(DISTINCT b.pageviews) = 1 #  Yes 1 strip , That is to say, they are tied for the second place  
a.pageviews = 2,b.pageviews  It can take  [4],count(DISTINCT b.pageviews) = 1 #  Yes 1 strip , That's the second place  
a.pageviews = 4,b.pageviews  It can take  [],count(DISTINCT b.pageviews) = 0 #  Yes 0 strip , That is, the biggest   The first name 

count(DISTINCT e2.Salary) < 2 , therefore a.pageviews It can be taken as 2、2、4, Before the assembly 2 high , But there are three pieces of data

3、 ... and 、 summary

Demand transformation : We will find the first few in groups , It's self related , There are several numbers larger than this one

In fact, this is similar to LeetCode The difficulty is hard A database title of

185. All the employees with the top three salaries in the Department

Reference resources :

https://leetcode-cn.com/problems/department-top-three-salaries/solution/185-bu-men-gong-zi-qian-san-gao-de-yuan-gong-by-li/

 

版权声明
本文为[osc A 3gbrdm]所创,转载请带上原文链接,感谢