当前位置:网站首页>3 frequently tested SQL data analysis questions (including data and code)
3 frequently tested SQL data analysis questions (including data and code)
2022-06-29 05:35:00 【one thousand four hundred and eighty】
In the recruitment process of data posts , I often examine the job seekers SQL Ability , It's sorted out here 3 A regular exam SQL Data analysis questions , Sort from simple to complex , Let's test. Do you have it ?
PS: following SQL Code in MySQL8.0 And above .
subject 1: Find the second highest paid employee in each department
An existing employee information form employee, The table contains the following 4 A field .
- employee_id( staff ID):VARCHAR.
- employee_name( Employee name ):VARCHAR.
- employee_salary( Employee pay ):INT.
- department( The Department to which the employee belongs ID):VARCHAR.
employee The data of the table is shown in the following table .
And a department information sheet department, The table contains the following two fields .
- department_id( department ID):VARCHAR.
- department_name( Department name ):VARCHAR.
department The data of the table is shown in the following table .
The code of data import is as follows :
DROP TABLE IF EXISTS employee;
CREATE TABLE employee(
employee_id VARCHAR(8),
employee_name VARCHAR(8),
employee_salary INT(8),
department VARCHAR(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
employee (employee_id,employee_name,employee_salary,department)
VALUE ('a001','Bob',7000,'b1')
,('a002','Jack',9000,'b1')
,('a003','Alice',8000,'b2')
,('a004','Ben',5000,'b2')
,('a005','Candy',4000,'b2')
,('a006','Allen',5000,'b2')
,('a007','Linda',10000,'b3');
DROP TABLE IF EXISTS department;
CREATE TABLE department(
department_id VARCHAR(8),
department_name VARCHAR(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
department (department_id,department_name)
VALUE ('b1','Sales')
,('b2','IT')
,('b3','Product');problem : Query the information of the employee with the second highest salary in each department .
Output contents include :
- employee_id( staff ID)
- employee_name( Employee name )
- employee_salary( Employee pay )
- department_id( Name of the Department to which the employee belongs )
The result example is shown in the figure below .
Problem solving ideas for reference : Use the window function according to the Department ID grouping , In the group, it is arranged in descending order of employee salary and recorded as ranking, Then, the processed table is internally connected with the Department information table , So as to associate the Department name , Finally, use... On the connected table ranking=2 As the second highest salary condition WHERE Screening , Select the columns you want , You get the result .
Knowledge points involved : Window function 、 Subquery 、 Multiple table joins .
Topic SQL The code is as follows , For reader's reference :
SELECT a.employee_id
,a.employee_name
,a.employee_salary
,b.department_id
FROM
(
SELECT *
,RANK() OVER (PARTITION BY department ORDER BY employee_salary DESC) AS ranking
FROM employee
) AS a
INNER JOIN department AS b
ON a.department = b.department_id
WHERE a.ranking = 2;subject 2: Statistics of website login interval
There is a website login form login_info, This table records the website login information of all users , Contains the following two fields .
- user_id( user ID):VARCHAR.
- login_time( User login date ):DATE.
login_info The data of the table is shown in the following table .
The code of data import is as follows :
DROP TABLE IF EXISTS login_info;
CREATE TABLE login_info(
user_id VARCHAR(8),
login_time DATE
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
login_info (user_id,login_time)
VALUE ('a001','2021-01-01')
,('b001','2021-01-01')
,('a001','2021-01-03')
,('a001','2021-01-06')
,('a001','2021-01-07')
,('b001','2021-01-07')
,('a001','2021-01-08')
,('a001','2021-01-09')
,('b001','2021-01-09')
,('b001','2021-01-10')
,('b001','2021-01-15')
,('a001','2021-01-16')
,('a001','2021-01-18')
,('a001','2021-01-19')
,('b001','2021-01-20')
,('a001','2021-01-23');problem : Calculate the login date interval of each user is less than 5 Number of days .
Output contents include :
- user_id( user ID)
- num( User login date interval is less than 5 Number of days )
The result example is shown in the figure below .
Problem solving ideas for reference : The examination of this question LEAD() Function in dealing with the problem of time interval , Look at the query part of the inner layer , Use LEAD() Function in the original login_time Create a new time field based on the field ( That is, the next login date of the user ), The inner query code is as follows :
SELECT user_id
,login_time
,LEAD(login_time,1) OVER (PARTITION BY user_id ORDER BY login_time) AS next_login_time
FROM login_info;
The query results are shown in the following figure .
You can see in the picture above , after LEAD() After the function , The data will be based on user_id The fields are grouped according to login_time Field sorting . After the treatment of the inner layer , Just sift out... In the outer layer next_login_time And login_time The date difference of the field is less than 5 Days of data , That is, the final statistical target data , It's used here TIMESTAMPDIFF(DAY, login_time, next_login_time) Calculate the date difference , Finally, the group aggregation statistics are different user_id Number of records , That is, the login date interval of each user is less than 5 Number of days .
Knowledge points involved : Window function 、 Subquery 、 Group aggregation 、 Time function .
Topic SQL The code is as follows , For reader's reference :
SELECT a.user_id
,COUNT(*) AS num
FROM
(
SELECT user_id
,login_time
,LEAD(login_time,1) OVER (PARTITION BY user_id ORDER BY login_time) AS next_login_time
FROM login_info
) AS a
WHERE TIMESTAMPDIFF(DAY, login_time, next_login_time) < 5
GROUP BY user_id;subject 3: User purchase channel analysis
There is a user purchase information form purchase_channel, This table records the shopping information of users on a shopping platform , The shopping platform has a web side (web) And mobile phones (app) Two ways to access , The table contains the following 4 A field .
- user_id( user ID):VARCHAR.
- channel( User purchase channel ):VARCHAR.
- purchase_date( Date of purchase ):DATE.
- purchase_amount( Purchase amount ):INT.
purchase_channel The data of the table is shown in the following table .
The data import code is as follows :
DROP TABLE IF EXISTS purchase_channel;
CREATE TABLE purchase_channel(
user_id VARCHAR(8),
channel VARCHAR(8),
purchase_date DATE,
purchase_amount INT(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
purchase_channel (user_id,channel,purchase_date,purchase_amount)
VALUE ('a001','app','2021-03-14',200)
,('a001','web','2021-03-14',100)
,('a002','app','2021-03-14',400)
,('a001','web','2021-03-15',3000)
,('a002','app','2021-03-15',900)
,('a003','app','2021-03-15',1000);problem : Query users who only use mobile phones every day 、 Users who only use the web side and those who use the web side and mobile side at the same time (both) Different number of users and total shopping amount , And even if one day there is no user's purchase information in a certain channel , Also need to show .
Output contents include :
purchase_date( date )
channel( Purchase channel )
sum_amount( Total purchase amount )
total_users( Number of different users )
The result example is shown in the figure below .
Problem solving ideas for reference : According to the user ID And date , By counting the number of records of users shopping in each purchase channel, we can judge the access mode adopted by a user when shopping on a certain date (web、app and both). among ,web and app Through a SELECT Statement query ,both You can use another SELECT Statement query . Use two parts UNION come together , And take the above part as a sub query , External statistics of different purchase dates in sub query 、 The total purchase amount of the purchase channel and the total purchase users .
This part SQL The code is as follows :
SELECT purchase_date
,channel
,SUM(sum_amount) sum_amount
,SUM(total_users) total_users
FROM
(
SELECT purchase_date
,MIN(channel) channel
,SUM(purchase_amount) sum_amount
,COUNT(DISTINCT user_id) total_users
FROM purchase_channel
GROUP BY purchase_date
,user_id
HAVING COUNT(DISTINCT channel) = 1 UNION
SELECT purchase_date
,'both' channel
,SUM(purchase_amount) sum_amount
,COUNT(DISTINCT user_id) total_users
FROM purchase_channel
GROUP BY purchase_date
,user_id
HAVING COUNT(DISTINCT channel) > 1
) c
GROUP BY purchase_date
,channel;The output results of this part are shown in the figure below .
The above part seems to have completed the requirements of this question , But if you look closely, you will find , The title requires that even if one day there is no user's purchase information in a certain channel , Also need to show . And want to show more complete information , Then consider using the most complete information ( All dates and 3 Cartesian product of channels ) Compare with the result data table just found LEFT JOIN Connect , You can get the result of connecting the two tables according to the date and channel .
Knowledge points involved :UNION、 Group aggregation 、 Data De duplication .
Topic SQL The code is as follows , For reader's reference :
SELECT t1.purchase_date
,t1.channel
,t2.sum_amount
,t2.total_users
FROM
(
SELECT DISTINCT a.purchase_date
,b.channel
FROM purchase_channel a,
(
SELECT "app" AS channel
UNION
SELECT "web" AS channel
UNION
SELECT "both" AS channel
) b
) t1
LEFT JOIN
(
SELECT
purchase_date,
channel,
SUM(sum_amount) sum_amount,
SUM(total_users) total_users
FROM (
SELECT purchase_date
,MIN(channel) channel
,SUM(purchase_amount) sum_amount
,COUNT(DISTINCT user_id) total_users
FROM purchase_channel
GROUP BY purchase_date,user_id
HAVING COUNT(DISTINCT channel) = 1
UNION
SELECT purchase_date
,'both' channel
,SUM(purchase_amount) sum_amount
,COUNT(DISTINCT user_id) total_users
FROM purchase_channel
GROUP BY purchase_date,user_id
HAVING COUNT(DISTINCT channel) > 1
)c GROUP BY purchase_date, channel
) t2
ON t1.purchase_date = t2.purchase_date AND t1.channel = t2.channel;Have you made these questions ?
The title of this article is extracted from the latest published 《SQL Data analysis : From basic ice breaking to interview problem solving 》
边栏推荐
- Multiline regular expression search in Visual Studio code - multiline regular expression search in Visual Studio code
- [Verilog quick start of Niuke network question brushing series] ~ asynchronous reset Series T trigger
- 2022 recommended quantum industry research industry development planning prospect investment market analysis report (the attachment is a link to the online disk, and the report is continuously updated
- Loosely matched jest A value in tohavebeencalledwith - loose match one value in jest toHaveBeenCalledWith
- 想问问,券商选哪个比较好尼?本人小白不懂,现在网上开户安全么?
- 机器人强化学习——第一人称 VS 第三人称
- Meso tetra (4-N, N, n-trimethylaminophenyl) porphyrin (ttmapp) /meso tetra - [4- (BOC threonine) aminophenyl] porphyrin (TAPP thr BOC) supplied by Qiyue
- 开启生态新姿势 | 使用 WordPress 远程附件存储到 COS
- (practice C language every day) matrix
- real time AI based system questionaires
猜你喜欢

2022 recommended trend toy blind box industry research report industry development prospect market analysis white paper (the attachment is a link to the network disk, and the report is continuously up

I haven't encountered these three problems. I'm sorry to say that I used redis

2-nitro-5,10,15,20-tetra (3,5-dimethoxyphenyl) porphyrin (no2tdmpp) H2) /5,10,15,20-tetra (4-methylphenyl) porphyrin (TMPP) H2) Qiyue porphyrin products

Blip: conduct multimodal pre training with cleaner and more diverse data, and the performance exceeds clip! Open source code

Tcapulusdb Jun · industry news collection (VI)

What has urbanization brought to our mental health and behavior?

Quickly write MVVM code using source generators
![Meso tetra (4-N, N, n-trimethylaminophenyl) porphyrin (ttmapp) /meso tetra - [4- (BOC threonine) aminophenyl] porphyrin (TAPP thr BOC) supplied by Qiyue](/img/a9/0869c4f39a96cff63d1e310292c46d.jpg)
Meso tetra (4-N, N, n-trimethylaminophenyl) porphyrin (ttmapp) /meso tetra - [4- (BOC threonine) aminophenyl] porphyrin (TAPP thr BOC) supplied by Qiyue

Service grid ASM year end summary: how do end users use the service grid?

5000+ word interpretation | Product Manager: how to do a good job in component selection?
随机推荐
Hantai oscilloscope software | Hantai oscilloscope upper computer software ns-scope, add measurement data arbitrarily
嵌入式RTOS
Would like to ask, which is the better choice for securities companies? I don't understand. Is it safe to open an account online now?
5- (4-benzoimide phenyl) - 10,15,20-triphenylporphyrin (battph2) and its Zn complex (battpzn) / tetra (4-aminophenyl) porphyrin (tapph2) Qiyue supply
嵌入式RTOS
Research Report on recommended specialized, special and new industries in 2022 industry development prospect and market investment analysis (the attachment is a link to the online disk, and the report
To learn more about Yibo Hongmeng development
[IOT] description of renaming the official account "Jianyi commerce" to "product renweipeng"
β- Tetraphenyl nickel porphyrin with all chlorine substitution| β- Thiocyano tetraphenyl porphyrin copper| β- Dihydroxy tetraphenyl porphyrin 𞓜 2-nitroporphyrin | supplied by Qiyue
Slot
i-Teams W3: How to build a sound-bottling business
Sailing with karmada: multi cluster management of massive nodes
DataX connection MySQL cannot find driver
Ti Click: quickly set up tidb online laboratory through browser | ti- team interview can be conducted immediately
Common optimization items
Can use the mouse, will reinstall the computer system tutorial sharing
Software architecture final review summary
2022 recommended property management industry research report industry development prospect market investment analysis (the attachment is the link to the online disk, and the report is continuously up
Service grid ASM year end summary: how do end users use the service grid?
It is said on the Internet that a student from Guangdong has been admitted to Peking University for three times and earned a total of 2million yuan in three years