当前位置：网站首页>3 frequently tested SQL data analysis questions (including data and code)

3 frequently tested SQL data analysis questions (including data and code)

2022-06-29 05:35:00 【one thousand four hundred and eighty】

In the recruitment process of data posts , I often examine the job seekers SQL Ability , It's sorted out here 3 A regular exam SQL Data analysis questions , Sort from simple to complex , Let's test. Do you have it ？

PS： following SQL Code in MySQL8.0 And above .

subject 1： Find the second highest paid employee in each department

An existing employee information form employee, The table contains the following 4 A field .

employee_id（ staff ID）：VARCHAR.
employee_name（ Employee name ）：VARCHAR.
employee_salary（ Employee pay ）：INT.
department（ The Department to which the employee belongs ID）：VARCHAR.

employee The data of the table is shown in the following table .

And a department information sheet department, The table contains the following two fields .

department_id（ department ID）：VARCHAR.
department_name（ Department name ）：VARCHAR.

department The data of the table is shown in the following table .

The code of data import is as follows ：

DROP TABLE IF EXISTS employee;
CREATE TABLE employee(
employee_id VARCHAR(8),
employee_name VARCHAR(8),
employee_salary INT(8),
department VARCHAR(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
employee (employee_id,employee_name,employee_salary,department) 
VALUE ('a001','Bob',7000,'b1')
     ,('a002','Jack',9000,'b1')
     ,('a003','Alice',8000,'b2')
     ,('a004','Ben',5000,'b2')
     ,('a005','Candy',4000,'b2')
     ,('a006','Allen',5000,'b2')
     ,('a007','Linda',10000,'b3');


DROP TABLE IF EXISTS department;
CREATE TABLE department(
department_id VARCHAR(8),
department_name VARCHAR(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
department (department_id,department_name) 
VALUE ('b1','Sales')
     ,('b2','IT')
     ,('b3','Product');

problem ： Query the information of the employee with the second highest salary in each department .

Output contents include ：

employee_id（ staff ID）
employee_name（ Employee name ）
employee_salary（ Employee pay ）
department_id（ Name of the Department to which the employee belongs ）

The result example is shown in the figure below .

Problem solving ideas for reference ： Use the window function according to the Department ID grouping , In the group, it is arranged in descending order of employee salary and recorded as ranking, Then, the processed table is internally connected with the Department information table , So as to associate the Department name , Finally, use... On the connected table ranking=2 As the second highest salary condition WHERE Screening , Select the columns you want , You get the result .

Knowledge points involved ： Window function 、 Subquery 、 Multiple table joins .

Topic SQL The code is as follows , For reader's reference ：

SELECT  a.employee_id
       ,a.employee_name
       ,a.employee_salary
       ,b.department_id
FROM 
(
    SELECT  *
           ,RANK() OVER (PARTITION BY department ORDER BY employee_salary DESC) AS ranking
    FROM employee 
) AS a
INNER JOIN department AS b
ON a.department = b.department_id
WHERE a.ranking = 2;

subject 2： Statistics of website login interval

There is a website login form login_info, This table records the website login information of all users , Contains the following two fields .

user_id（ user ID）：VARCHAR.
login_time（ User login date ）：DATE.

login_info The data of the table is shown in the following table .

The code of data import is as follows ：

DROP TABLE IF EXISTS login_info;
CREATE TABLE login_info(
user_id VARCHAR(8),
login_time DATE
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
login_info (user_id,login_time) 
VALUE ('a001','2021-01-01')
,('b001','2021-01-01')
,('a001','2021-01-03')
,('a001','2021-01-06')
,('a001','2021-01-07')
,('b001','2021-01-07')
,('a001','2021-01-08')
,('a001','2021-01-09')
,('b001','2021-01-09')
,('b001','2021-01-10')
,('b001','2021-01-15')
,('a001','2021-01-16')
,('a001','2021-01-18')
,('a001','2021-01-19')
,('b001','2021-01-20')
,('a001','2021-01-23');

problem ： Calculate the login date interval of each user is less than 5 Number of days .

Output contents include ：

user_id（ user ID）
num（ User login date interval is less than 5 Number of days ）

The result example is shown in the figure below .

Problem solving ideas for reference ： The examination of this question LEAD() Function in dealing with the problem of time interval , Look at the query part of the inner layer , Use LEAD() Function in the original login_time Create a new time field based on the field （ That is, the next login date of the user ）, The inner query code is as follows ：

SELECT  user_id 
       ,login_time 
       ,LEAD(login_time,1) OVER (PARTITION BY user_id ORDER BY login_time) AS next_login_time
FROM login_info;

The query results are shown in the following figure .

You can see in the picture above , after LEAD() After the function , The data will be based on user_id The fields are grouped according to login_time Field sorting . After the treatment of the inner layer , Just sift out... In the outer layer next_login_time And login_time The date difference of the field is less than 5 Days of data , That is, the final statistical target data , It's used here TIMESTAMPDIFF(DAY, login_time, next_login_time) Calculate the date difference , Finally, the group aggregation statistics are different user_id Number of records , That is, the login date interval of each user is less than 5 Number of days .

Knowledge points involved ： Window function 、 Subquery 、 Group aggregation 、 Time function .

Topic SQL The code is as follows , For reader's reference ：

SELECT  a.user_id
       ,COUNT(*) AS num
FROM 
(
    SELECT  user_id
           ,login_time
           ,LEAD(login_time,1) OVER (PARTITION BY user_id ORDER BY login_time) AS next_login_time
    FROM login_info
) AS a
WHERE TIMESTAMPDIFF(DAY, login_time, next_login_time) < 5 
GROUP BY user_id;

subject 3： User purchase channel analysis

There is a user purchase information form purchase_channel, This table records the shopping information of users on a shopping platform , The shopping platform has a web side （web） And mobile phones （app） Two ways to access , The table contains the following 4 A field .

user_id（ user ID）：VARCHAR.
channel（ User purchase channel ）：VARCHAR.
purchase_date（ Date of purchase ）：DATE.
purchase_amount（ Purchase amount ）：INT.

purchase_channel The data of the table is shown in the following table .

The data import code is as follows ：

DROP TABLE IF EXISTS purchase_channel;
CREATE TABLE purchase_channel(
user_id VARCHAR(8),
channel VARCHAR(8),
purchase_date DATE,
purchase_amount INT(8)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;
INSERT INTO
purchase_channel (user_id,channel,purchase_date,purchase_amount) 
VALUE ('a001','app','2021-03-14',200)
     ,('a001','web','2021-03-14',100)
     ,('a002','app','2021-03-14',400)
     ,('a001','web','2021-03-15',3000)
     ,('a002','app','2021-03-15',900)
     ,('a003','app','2021-03-15',1000);

problem ： Query users who only use mobile phones every day 、 Users who only use the web side and those who use the web side and mobile side at the same time （both） Different number of users and total shopping amount , And even if one day there is no user's purchase information in a certain channel , Also need to show .

Output contents include ：

purchase_date（ date ）

channel（ Purchase channel ）

sum_amount（ Total purchase amount ）

total_users（ Number of different users ）

The result example is shown in the figure below .

Problem solving ideas for reference ： According to the user ID And date , By counting the number of records of users shopping in each purchase channel, we can judge the access mode adopted by a user when shopping on a certain date （web、app and both）. among ,web and app Through a SELECT Statement query ,both You can use another SELECT Statement query . Use two parts UNION come together , And take the above part as a sub query , External statistics of different purchase dates in sub query 、 The total purchase amount of the purchase channel and the total purchase users .

This part SQL The code is as follows ：

SELECT  purchase_date 
       ,channel 
       ,SUM(sum_amount) sum_amount 
       ,SUM(total_users) total_users
FROM 
(
    SELECT  purchase_date 
           ,MIN(channel) channel 
           ,SUM(purchase_amount) sum_amount 
           ,COUNT(DISTINCT user_id) total_users
    FROM purchase_channel
    GROUP BY  purchase_date
             ,user_id
    HAVING COUNT(DISTINCT channel) = 1 UNION
    SELECT  purchase_date 
           ,'both' channel 
           ,SUM(purchase_amount) sum_amount 
           ,COUNT(DISTINCT user_id) total_users
    FROM purchase_channel
    GROUP BY  purchase_date
             ,user_id
    HAVING COUNT(DISTINCT channel) > 1 
) c
GROUP BY  purchase_date
         ,channel;

The output results of this part are shown in the figure below .

The above part seems to have completed the requirements of this question , But if you look closely, you will find , The title requires that even if one day there is no user's purchase information in a certain channel , Also need to show . And want to show more complete information , Then consider using the most complete information （ All dates and 3 Cartesian product of channels ） Compare with the result data table just found LEFT JOIN Connect , You can get the result of connecting the two tables according to the date and channel .

Knowledge points involved ：UNION、 Group aggregation 、 Data De duplication .

Topic SQL The code is as follows , For reader's reference ：

SELECT  t1.purchase_date
       ,t1.channel
       ,t2.sum_amount
       ,t2.total_users
FROM 
(
    SELECT  DISTINCT a.purchase_date 
           ,b.channel
    FROM purchase_channel a, 
    (
        SELECT  "app" AS channel 
        UNION
        SELECT  "web" AS channel 
        UNION
        SELECT  "both" AS channel 
    ) b
) t1
LEFT JOIN 
(
SELECT 
purchase_date,
channel,
SUM(sum_amount) sum_amount,
SUM(total_users) total_users
FROM (
SELECT  purchase_date 
           ,MIN(channel) channel 
           ,SUM(purchase_amount) sum_amount 
           ,COUNT(DISTINCT user_id) total_users
    FROM purchase_channel
    GROUP BY  purchase_date,user_id
    HAVING COUNT(DISTINCT channel) = 1 
    UNION
    SELECT  purchase_date 
           ,'both' channel 
           ,SUM(purchase_amount) sum_amount 
           ,COUNT(DISTINCT user_id) total_users
    FROM purchase_channel
    GROUP BY  purchase_date,user_id
    HAVING COUNT(DISTINCT channel) > 1
)c GROUP BY purchase_date, channel
) t2
ON t1.purchase_date = t2.purchase_date AND t1.channel = t2.channel;

Have you made these questions ？

The title of this article is extracted from the latest published 《SQL Data analysis ： From basic ice breaking to interview problem solving 》

原网站

版权声明
本文为[one thousand four hundred and eighty]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160951248391.html

当前位置：网站首页>3 frequently tested SQL data analysis questions (including data and code)

3 frequently tested SQL data analysis questions (including data and code)

边栏推荐

猜你喜欢

随机推荐