当前位置:网站首页>How to analyze the taxi business problem of didi SQL interview question
How to analyze the taxi business problem of didi SQL interview question
2022-07-28 07:36:00 【Begin to change】
Catalog
①2020 year 8 Number of drivers per day in each city in the month
② Express order volume and express flow data
①2020 year 8 The month and 9 The number of new drivers in the month
① The number of drivers is greater than 20 The name of the city
② The total online time of the driver is greater than 2 Hours
③ The order quantity is greater than 1
④ The number of passengers is greater than 1
One 、 subject
company A Yes app( Similar to didi 、uber) Provide users with taxi service . There are four tables , Namely “ Driver data ” surface ,“ Order data ” surface ,“ Online duration data ” surface ,“ City matching data ” surface . The business problem :
1. extract 2020 year 8 Number of drivers per day in each city in the month 、 Express order volume and express flow data .
2. extract 2020 year 8 The month and 9 month , New and old drivers in Beijing every month ( The first order date is the new driver in the current month ) Number of drivers 、 Online duration and TPH( Order quantity / Online hours ) data .
3. The number of drivers extracted separately is greater than 20, The total on-line time of the driver is greater than 2 Hours , The order quantity is greater than 1, The number of passengers is greater than 1 City name data for .
Two 、 step
1、 Data type conversion
By observing the database table structure , The storage type of time is varchar, But it needs to be extracted to the month , So you need to convert the type of time column to date Specific format ( Multiple times are involved in the data )
update Driver data
set date =date_format( date ,'%Y-%m-%d');2、 extract 2020 year 8 Number of drivers per day in each city in the month , Express order volume and express flow data
①2020 year 8 Number of drivers per day in each city in the month
According to the meaning of the topic and the table structure , It is necessary to connect the city matching data with the driver data , The condition of connection is that the driver id equal
One limitation ——2020 year 8 month , That is, the date of the driver data sheet is 2020-08-01 To 2020-08-31 Within limits
An aggregation —— For the driver in the driver data sheet id Aggregate count()
Two groups ——8 Every city in the month , That is to say 8 Every day of the month and the situation of each city , So we need to check the date and city id Grouping
select b.` The city name `,a.` date `,COUNT( The driver id) as ' Number of drivers 'from
` Driver data ` as a left join ` City matching data ` as b
on a.` City id`=b.` City id`
where a.` date ` between '2020-08-01' and '2020-08-31'
group by a.` City id`,a.` date `② Express order volume and express flow data
In four tables , Only the order data contains express data , But there are no cities in the table id And the driver id, So you need to associate with other tables to get results
Two constraints —— One is the time limit , One is the type of order
Two aggregate —— One is the order quantity , One is flow data
The order quantity is also for the product line id Sum the numbers of
Flow data is the sum of the flow in the table
Two groups ——8 Every city in the month , That is to say 8 Every day of the month and the situation of each city , So we need to check the date and city id Grouping
select c.` The city name `,a.` date `,COUNT(a.` product line id`) as ' Express orders ',SUM(a.` Running water `) as ' Express flow data ' from
` Order data ` as a left join ` Driver data ` as b
on a.` The driver id`=b.` The driver id`
left join ` City matching data ` as c
on b.` City id` = c.` City id`
where a.` date ` between '2020-08-01' and '2020-08-31' and a.` product line id`=3
GROUP BY c.` City id`,b.` date `③ Summary
The first two steps have worked out the desired data respectively , But because of judging the conditions , You can't combine two tables , Therefore, only two tables can be used as sub tables , Then check the sub table to get the data
3、 extract 2020 year 8 The month and 9 month , New and old drivers in Beijing every month ( The first order date is the new driver in the current month ) Number of drivers 、 Online duration and TPH( Order quantity / Online hours ) data .
①2020 year 8 The month and 9 The number of new drivers in the month
According to the data , All the data needed are in one table , So there is no need to connect the watch
Three restrictions —— Time 、 New driver ( The first order date is in the current month ), The city is Beijing
The time condition of the last question is 8 month , The question is 8 The month and 9 Month is to expand the scope of time to 2020-09-31
The condition for a new driver is that the first order date is in the current month , That is, use the time function to get that the date of the first order is equal to the month of the date
Cities matching data through cities id Confirm that the city is Beijing
grouping —— Just group by time
SELECT
date ,
COUNT(` The driver id` ) AS ' Number of new drivers '
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-09-31' )
AND ` City id` = 100000
GROUP BY
DATE_FORMAT(` date `, '%Y-%m' );② Online hours
The online duration is in the online duration data table, while the order completion time is in the driver data table , So you need to associate the two tables , The related condition is the driver id
That is, the sub table formed on the basis of the first question is associated with the online duration data table
Online duration requires a sum calculation
SELECT
a.` date `,
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT(
a.` date `,
'%Y-%m')③ Order quantity
The calculation of order quantity is consistent with the principle of total online time , It's just that the associated tables are different
SELECT
a.` date `,
COUNT(b.` Order id`) as ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT(
a.` date `,
'%Y-%m')④ summary
With the data found above, you can use it as a sub table , Then get the final data by looking up the sub table
SELECT
a.` New driver THP` AS ' New driver THP',
b.` old hand THP` AS ' old hand THP'
FROM
(
SELECT
b.` Order quantity ` / a.` Online hours ` AS ' New driver THP'
FROM
(
SELECT
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS a,
(
SELECT
COUNT( b.` Order id` ) AS ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS b
) AS a,
(
SELECT
b.` Order quantity ` / a.` Online hours ` AS ' old hand THP'
FROM
(
SELECT
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS a,
(
SELECT
COUNT( b.` Order id` ) AS ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` ) <> YEAR ( ` date ` )
AND MONTH ( ` date ` ) <> MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS b
) AS b4、 The number of drivers extracted separately is greater than 20, The total on-line time of the driver is greater than 2 Hours , The order quantity is greater than 1, The number of passengers is greater than 1 City name data for .
① The number of drivers is greater than 20 The name of the city
The driver's data is in the driver data sheet , The city name is in the city matching data table , So we need to connect the tables , The condition of connecting tables is the city id
We also need to pay attention to the city id Grouping , Then count the number of drivers , And its quantity should be greater than 20, Because it is the statistics after grouping , Therefore, the judgment condition cannot be used where, But to use. having, And after the Group
SELECT
b.` The city name `,
COUNT( a.` The driver id` ) AS ' Number of City drivers '
FROM
` Driver data ` AS a
LEFT JOIN ` City matching data ` AS b ON a.` City id` = b.` City id`
GROUP BY
b.` City id`
HAVING
COUNT( a.` The driver id` )> 20 ② The total online time of the driver is greater than 2 Hours
Online duration data , City id There is no key that can be connected between the two tables , Therefore, it is necessary to use the driver data table as the intermediate table for Association
First, check the data in the online duration table for the driver id And then aggregate to get the total online time , Then filter the larger than 2 Hours of data
Associate the filtered table with driver data table and city information table
③ The order quantity is greater than 1
The idea is consistent with the total online time
SELECT
a.` The driver id`,
a.` Order quantity `,
c.` The city name `
FROM
( SELECT ` The driver id`, COUNT( ` The driver id` ) AS ` Order quantity ` FROM ` Order data ` GROUP BY ` The driver id` HAVING COUNT( ` The driver id` ) > 1 ) AS a
LEFT JOIN ( SELECT ` The driver id`, ` City id` FROM ` Driver data ` ) AS b ON a.` The driver id` = b.` The driver id`
JOIN ` City matching data ` AS c ON b.` City id` = c.` City id` ④ The number of passengers is greater than 1
The same way of thinking
SELECT
a.` The driver id`,
a.` Number of passengers `,
c.` The city name `
FROM
( SELECT ` The driver id`, COUNT( ` Passenger id` ) AS ` Number of passengers ` FROM ` Order data ` GROUP BY ` The driver id` HAVING COUNT( ` Passenger id` ) > 1 ) AS a
LEFT JOIN ( SELECT ` The driver id`, ` City id` FROM ` Driver data ` ) AS b ON a.` The driver id` = b.` The driver id`
JOIN ` City matching data ` AS c ON b.` City id` = c.` City id`⑤ Summary
When there are many tables that need to be associated , You can first get the filtered data according to a table , And then associate with other data , Because the screening condition or range is only a small range , It is easy to make mistakes after all tables are associated
For more interview questions, you can pay attention to the data analysis of official account monkeys
边栏推荐
猜你喜欢

On deep paging

再次出现用户净流失,大失颜面的中国移动推出超低价套餐争取用户

JS upload file method

【干货】32个EMC标准电路分享!

Daily question - split equal sum subset

DNA修饰金属铑Rh纳米颗粒RhNPS-DNA(DNA修饰贵金属纳米颗粒)

Student duty problems

ArcGIS JS自定义Accessor,并通过watchUtils相关方法watch属性

Shortest seek time first (SSTF)

(daily question) - the longest substring without repeated characters
随机推荐
Log in to heroku and the solution of IP address mismatch appears
Current limiting ratelimiter of guava
常用电子产品行业标准及认证
Essential performance optimization topics in the interview~
【着色器实现Negative反色效果_Shader效果第十一篇】
[solution] visual full link log tracking - log tracking system
Summary of project experience
Image segmentation method
Guava cache of guava
调整数组顺序使奇数位于偶数前面——每日两题
EMC整改方法集合
EMC's "don't come back until you rectify"
cdn.jsdelivr.net不可用,该怎么办
铜铟硫CuInSe2量子点修饰DNA(脱氧核糖核酸)DNA-CuInSe2QDs(齐岳)
DNA脱氧核糖核酸修饰金属铂纳米颗粒PtNPS-DNA|科研试剂
Don't be afraid of ESD static electricity. This article tells you some solutions
On deep paging
The first common node of two linked lists -- two questions per day
EMC中的基石-电磁兼容滤波知识大全!
[untitled]