Looking at SQL optimization from the whole process of one query

Work together , Grow up together ！ This is my participation 「 Nuggets day new plan · 8 Yuegengwen challenge 」 Of the 2 God , Click to see the event details

summary

Last one The order process of the order business on the external platform is briefly introduced , Through the generated order data, this paper analyzes the system architecture evolution if the system encounters a performance bottleneck .

In order business, the business and process involved in the operation of placing an order and querying the order list are not very complex , The difficulty is that when the amount of data changes greatly, it cannot affect the user experience , Place an order with the user 、 Check the order list 、 When modifying order data, try to ensure the response timeliness of the system .

The following articles mainly analyze the optimization scheme proposed before , From the first below SQL Optimization plan Start to analyze

The whole process of one query

First of all, analyze the whole process of querying an order request . The user initiates a request , The background application service received the query request ,WEB Containers Tomcat Start a thread to process this query request , Call the application code written to execute the query in the database SQL sentence , Return query results .

Business flow chart ：

MySQL Inquire about SQL The implementation process of ：

InnoDB The data of the storage engine is stored Buffer Pool The smallest unit is chunk, Every chunk The default size is 128M, There is only one default chunk.
Inquire about buffer_pool Size ：SELECT @@innodb_buffer_pool_size/1024/1024;
Inquire about buffer_pool The number of instances ：SELECT @@innodb_buffer_pool_instances;

Call the interface to execute SQL after , All operations are based on memory , however MySQL The data of is persistent to disk , therefore MySQL There are memory and disk where the data exists , In memory are some data that have been operated recently or are being operated , There are almost all the latest data in the disk , The reason is almost , Because there may be newly written data in memory , But it has not been brushed into the disk in time . Therefore, if the queried data does not exist in memory , There is also a process of loading data from disk .

Query process summary

A query request is first sent to the back-end server through the network , Give it to the corresponding WEB application ,WEB The container opens a thread to process requests , Call business code , Business code execution SQL Inquire about , Application and MySQL Service establish connection , After a successful connection ,MySQL A thread will be allocated to handle the operation of the application , The application will need to execute SQL Send to MySQL service ,MySQL Service through SQL Parsing 、 Optimize the execution plan , Then call the function to execute SQL The interface of , To MySQL Query data in the storage engine of , Store the engine to its own buffer pool Query data in , If buffer pool No data in , The data of the corresponding table space will be loaded to the disk buffer pool in , Then return the data , Finally, the data is returned to the user .

disk IO Why is it so slow

Through the analysis of the query process , It is found that most processes are memory based operations , Only the last step may happen to the disk IO, disk IO It will affect the overall execution efficiency . So disk IO Why is it so slow , Further analysis of disk IO The reason for the slow .

The main part of hard disk is disk 、 Drive the arm 、 Read write head and spindle motor . The actual data is written on the disk , Reading and writing is mainly accomplished by driving the reading and writing head on the arm . In actual operation , The spindle turns the disk , Then the drive arm can be extended to allow the read head to read and write on the disk . The physical structure of the disk is shown in the figure below ：

Factors that affect hard disk performance ：

Seek time Tseek It refers to the time required to move the read / write head to the correct track . The shorter the search time ,I/O The faster the operation , At present, the average seek time of the disk is generally 3-15ms.
Rotation delay Trotation It refers to the time required for disk rotation to move the sector in which the requested data is located to the bottom of the read-write disk . The rotation delay depends on the disk speed , It usually takes time to rotate a disk for one revolution 1/2 Express . such as ：7200rpm The average disk rotation delay of is about 60*1000/7200/2 = 4.17ms, And the speed is 15000rpm The average rotation delay of the disk is 2ms.
Data transfer time Ttransfer It refers to the time required to complete the transmission of the requested data , It depends on the data rate , Its value is equal to the data size divided by the data transfer rate . at present IDE/ATA Can achieve 133MB/s,SATA II Accessible 300MB/s The interface data transfer rate , Data transmission time is usually far less than the time consumed by the first two parts . Simple calculation can be ignored .

A measure of performance ： The continuous read and write performance of mechanical hard disk is very good , But random read and write performance is poor , This is mainly because it takes time for the head to move to the correct track , Random reading and writing , The head needs to move all the time , Time is wasted on head addressing , So the performance is not high . The key measure of disk is IOPS And throughput .

MySQL Index formation

MySQL The storage units on the disk are data pages one by one , The data page contains 38 Byte file header 、56 Byte data header 、26 Maximum and minimum records of bytes 、 Data area 、 The free zone 、 Data page directory 、8 Your own file is also at the end , Pictured ：

Row by row data is stored in the data area , The expanded results are shown in the figure ：

In order to improve the efficiency of data search , There are two important structures in the data page Maximum record minimum record 、 Data page directory ,MySQL When looking up data , First, according to the primary key of the record to be searched ID Find the data page according to the maximum record and the minimum record of the data page , Find the address of a specific line of data according to the data page directory of the data page . There is a two-way linked list between data pages 、 The data row in the data page is the record of the single linked list , This also improves the efficiency of data query .

When the data page changes a lot , It will also reduce efficiency in the process of finding data pages , In this case ,MySQL Added a special data page , What is stored in this data page is not specific business data , It is the primary key of the data row in other data pages ID, When querying data, first query specific data pages through special data pages , Then query the specific data in the specific data page . This special data page is the index page .

Eventually, it will form an inverted tree like structure , This is the same. MySQL Index tree in ：

SQL Optimize the process

Business analysis and optimization process

According to the associated query fields 、where The condition has passed the field 、 Aggregate function fields 、 Sort fields, etc. to analyze and establish appropriate indexes .

among where Conditions 、 Aggregate functions 、 Sorting fields are relatively simple and easy to analyze , It should be noted that join Index building analysis of fields , It involves some MySQL Feature related algorithms .

mysql Use in join Statement Association 2 A watch , For example, implement this sql：

select t1.order_no, t2.product_id from order_info t1 left join order_item_detail t2 on t1.order_no = t2.order_no
 Copy code

This is the time ,join What does the process of association query look like ？ Actually , This depends on the current join The algorithm used in the statement ,join There are a total of 3 Species algorithm , The most basic thing is Simple nested loop Algorithm

join Sentence Optimization Algorithm

Simple nested loop Algorithm : It's equivalent to double for loop

Block nested loop Algorithm

MySQL One is provided in join buffer Of memory space , however join buffer The default size is 256kb, Limited memory , Can pass join buffer size Adjust the join buffer Size .

Index nested loop Algorithm

The original matching times are ： Number of driving table rows * Number of driven table rows , And now it's ： Number of driving table rows * The height of the driven table index , This greatly reduces the number of matches of the driven table , It's been greatly improved join Performance of .

If join If index can be used in association query ,MySQL Will use Index nested loop Algorithm , If you can't use it Index nested loop Algorithm ,MySQL By default Block nested loop Algorithm .

summary

From the above process analysis of a query , Brought in MySQL Index formation , To how to implement according to the characteristics of index structure SQL Optimize , And the optimized process . The premise of solving problems is to find problems , Only when problems are found can corresponding solutions be found . Business SQL It often goes through two stages of optimization , At the beginning of the business, according to the actual business SQL according to Optimize the process Create the corresponding index , The second stage is the system operation period , Through system monitoring, it is found that it is slow SQL Then optimize .