当前位置:网站首页>How to analyze the taxi business problem of didi SQL interview question
How to analyze the taxi business problem of didi SQL interview question
2022-07-28 07:36:00 【Begin to change】
Catalog
①2020 year 8 Number of drivers per day in each city in the month
② Express order volume and express flow data
①2020 year 8 The month and 9 The number of new drivers in the month
① The number of drivers is greater than 20 The name of the city
② The total online time of the driver is greater than 2 Hours
③ The order quantity is greater than 1
④ The number of passengers is greater than 1
One 、 subject
company A Yes app( Similar to didi 、uber) Provide users with taxi service . There are four tables , Namely “ Driver data ” surface ,“ Order data ” surface ,“ Online duration data ” surface ,“ City matching data ” surface . The business problem :
1. extract 2020 year 8 Number of drivers per day in each city in the month 、 Express order volume and express flow data .
2. extract 2020 year 8 The month and 9 month , New and old drivers in Beijing every month ( The first order date is the new driver in the current month ) Number of drivers 、 Online duration and TPH( Order quantity / Online hours ) data .
3. The number of drivers extracted separately is greater than 20, The total on-line time of the driver is greater than 2 Hours , The order quantity is greater than 1, The number of passengers is greater than 1 City name data for .
Two 、 step
1、 Data type conversion
By observing the database table structure , The storage type of time is varchar, But it needs to be extracted to the month , So you need to convert the type of time column to date Specific format ( Multiple times are involved in the data )
update Driver data
set date =date_format( date ,'%Y-%m-%d');2、 extract 2020 year 8 Number of drivers per day in each city in the month , Express order volume and express flow data
①2020 year 8 Number of drivers per day in each city in the month
According to the meaning of the topic and the table structure , It is necessary to connect the city matching data with the driver data , The condition of connection is that the driver id equal
One limitation ——2020 year 8 month , That is, the date of the driver data sheet is 2020-08-01 To 2020-08-31 Within limits
An aggregation —— For the driver in the driver data sheet id Aggregate count()
Two groups ——8 Every city in the month , That is to say 8 Every day of the month and the situation of each city , So we need to check the date and city id Grouping
select b.` The city name `,a.` date `,COUNT( The driver id) as ' Number of drivers 'from
` Driver data ` as a left join ` City matching data ` as b
on a.` City id`=b.` City id`
where a.` date ` between '2020-08-01' and '2020-08-31'
group by a.` City id`,a.` date `② Express order volume and express flow data
In four tables , Only the order data contains express data , But there are no cities in the table id And the driver id, So you need to associate with other tables to get results
Two constraints —— One is the time limit , One is the type of order
Two aggregate —— One is the order quantity , One is flow data
The order quantity is also for the product line id Sum the numbers of
Flow data is the sum of the flow in the table
Two groups ——8 Every city in the month , That is to say 8 Every day of the month and the situation of each city , So we need to check the date and city id Grouping
select c.` The city name `,a.` date `,COUNT(a.` product line id`) as ' Express orders ',SUM(a.` Running water `) as ' Express flow data ' from
` Order data ` as a left join ` Driver data ` as b
on a.` The driver id`=b.` The driver id`
left join ` City matching data ` as c
on b.` City id` = c.` City id`
where a.` date ` between '2020-08-01' and '2020-08-31' and a.` product line id`=3
GROUP BY c.` City id`,b.` date `③ Summary
The first two steps have worked out the desired data respectively , But because of judging the conditions , You can't combine two tables , Therefore, only two tables can be used as sub tables , Then check the sub table to get the data
3、 extract 2020 year 8 The month and 9 month , New and old drivers in Beijing every month ( The first order date is the new driver in the current month ) Number of drivers 、 Online duration and TPH( Order quantity / Online hours ) data .
①2020 year 8 The month and 9 The number of new drivers in the month
According to the data , All the data needed are in one table , So there is no need to connect the watch
Three restrictions —— Time 、 New driver ( The first order date is in the current month ), The city is Beijing
The time condition of the last question is 8 month , The question is 8 The month and 9 Month is to expand the scope of time to 2020-09-31
The condition for a new driver is that the first order date is in the current month , That is, use the time function to get that the date of the first order is equal to the month of the date
Cities matching data through cities id Confirm that the city is Beijing
grouping —— Just group by time
SELECT
date ,
COUNT(` The driver id` ) AS ' Number of new drivers '
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-09-31' )
AND ` City id` = 100000
GROUP BY
DATE_FORMAT(` date `, '%Y-%m' );② Online hours
The online duration is in the online duration data table, while the order completion time is in the driver data table , So you need to associate the two tables , The related condition is the driver id
That is, the sub table formed on the basis of the first question is associated with the online duration data table
Online duration requires a sum calculation
SELECT
a.` date `,
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT(
a.` date `,
'%Y-%m')③ Order quantity
The calculation of order quantity is consistent with the principle of total online time , It's just that the associated tables are different
SELECT
a.` date `,
COUNT(b.` Order id`) as ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT(
a.` date `,
'%Y-%m')④ summary
With the data found above, you can use it as a sub table , Then get the final data by looking up the sub table
SELECT
a.` New driver THP` AS ' New driver THP',
b.` old hand THP` AS ' old hand THP'
FROM
(
SELECT
b.` Order quantity ` / a.` Online hours ` AS ' New driver THP'
FROM
(
SELECT
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS a,
(
SELECT
COUNT( b.` Order id` ) AS ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS b
) AS a,
(
SELECT
b.` Order quantity ` / a.` Online hours ` AS ' old hand THP'
FROM
(
SELECT
sum( b.` Online hours ` ) AS ' Online hours '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` )= YEAR ( ` date ` )
AND MONTH ( ` date ` ) = MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Online duration data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS a,
(
SELECT
COUNT( b.` Order id` ) AS ' Order quantity '
FROM
(
SELECT
*
FROM
` Driver data `
WHERE
YEAR ( ` First order completion time ` ) <> YEAR ( ` date ` )
AND MONTH ( ` date ` ) <> MONTH ( ` First order completion time ` )
AND ( ` date ` BETWEEN '2020-08-01' AND '2020-08-31' OR ` date ` BETWEEN '2020-09-01' AND '2020-09-31' )
AND ` City id` = 100000
) AS a
LEFT JOIN ` Order data ` AS b ON a.` The driver id` = b.` The driver id`
GROUP BY
DATE_FORMAT( a.` date `, '%Y-%m' )) AS b
) AS b4、 The number of drivers extracted separately is greater than 20, The total on-line time of the driver is greater than 2 Hours , The order quantity is greater than 1, The number of passengers is greater than 1 City name data for .
① The number of drivers is greater than 20 The name of the city
The driver's data is in the driver data sheet , The city name is in the city matching data table , So we need to connect the tables , The condition of connecting tables is the city id
We also need to pay attention to the city id Grouping , Then count the number of drivers , And its quantity should be greater than 20, Because it is the statistics after grouping , Therefore, the judgment condition cannot be used where, But to use. having, And after the Group
SELECT
b.` The city name `,
COUNT( a.` The driver id` ) AS ' Number of City drivers '
FROM
` Driver data ` AS a
LEFT JOIN ` City matching data ` AS b ON a.` City id` = b.` City id`
GROUP BY
b.` City id`
HAVING
COUNT( a.` The driver id` )> 20 ② The total online time of the driver is greater than 2 Hours
Online duration data , City id There is no key that can be connected between the two tables , Therefore, it is necessary to use the driver data table as the intermediate table for Association
First, check the data in the online duration table for the driver id And then aggregate to get the total online time , Then filter the larger than 2 Hours of data
Associate the filtered table with driver data table and city information table
③ The order quantity is greater than 1
The idea is consistent with the total online time
SELECT
a.` The driver id`,
a.` Order quantity `,
c.` The city name `
FROM
( SELECT ` The driver id`, COUNT( ` The driver id` ) AS ` Order quantity ` FROM ` Order data ` GROUP BY ` The driver id` HAVING COUNT( ` The driver id` ) > 1 ) AS a
LEFT JOIN ( SELECT ` The driver id`, ` City id` FROM ` Driver data ` ) AS b ON a.` The driver id` = b.` The driver id`
JOIN ` City matching data ` AS c ON b.` City id` = c.` City id` ④ The number of passengers is greater than 1
The same way of thinking
SELECT
a.` The driver id`,
a.` Number of passengers `,
c.` The city name `
FROM
( SELECT ` The driver id`, COUNT( ` Passenger id` ) AS ` Number of passengers ` FROM ` Order data ` GROUP BY ` The driver id` HAVING COUNT( ` Passenger id` ) > 1 ) AS a
LEFT JOIN ( SELECT ` The driver id`, ` City id` FROM ` Driver data ` ) AS b ON a.` The driver id` = b.` The driver id`
JOIN ` City matching data ` AS c ON b.` City id` = c.` City id`⑤ Summary
When there are many tables that need to be associated , You can first get the filtered data according to a table , And then associate with other data , Because the screening condition or range is only a small range , It is easy to make mistakes after all tables are associated
For more interview questions, you can pay attention to the data analysis of official account monkeys
边栏推荐
- [untitled]
- How to understand CMS collector to reduce GC pause time
- xmpp 服务研究(二) prosody 创建账户
- 两个星期学会软件测试?我震惊了!
- Summary of RFID radiation test
- Delete the nodes in the linked list - daily question
- Essential performance optimization topics in the interview~
- JS upload file method
- 再次出现用户净流失,大失颜面的中国移动推出超低价套餐争取用户
- JUC原子类: CAS, Unsafe、CAS缺点、ABA问题如何解决详解
猜你喜欢

Redis configuration and optimization of NoSQL

Why is ESD protection so important for integrated circuits? How to protect?

Dynamic memory management knowledge points

EMC之 “不整改好别回来了”

动态内存管理知识点

The net loss of users occurred again, and China Mobile, which lost face, launched ultra-low price packages to win users

华为交换机拆解,学EMC基本操作

Don't be afraid of ESD static electricity. This article tells you some solutions
![[shaders realize negative anti color effect _shader effect Chapter 11]](/img/c5/70761374330eb4fb3915c335b7efb8.png)
[shaders realize negative anti color effect _shader effect Chapter 11]

Guava cache of guava
随机推荐
浅谈深分页问题
再次出现用户净流失,大失颜面的中国移动推出超低价套餐争取用户
Install pycharm
Modify the conf file through sed
ArcGIS JS自定义Accessor,并通过watchUtils相关方法watch属性
guava之guava cache
Map uses tuple to realize multiple value values
Retryer of guava
XMPP Service Research (II) prosody create account
On deep paging
DNA-CuInSeQDs近红外CuInSe量子点包裹脱氧核糖核酸DNA
删除链表中的节点——每日一题
EMC问题的根源在哪?
Essential performance optimization topics in the interview~
教程篇(7.0) 06. 零信任网络访问ZTNA * FortiClient EMS * Fortinet 网络安全专家 NSE 5
Collector原理解析
How to understand CMS collector to reduce GC pause time
The cornerstone of EMC - complete knowledge of electromagnetic compatibility filtering!
EMC rectification ideas
Safflower STL