当前位置:网站首页>Once the SQL is optimized, the database query speed is increased by 60 times
Once the SQL is optimized, the database query speed is increased by 60 times
2022-07-01 19:13:00 【Java Architect in Penghu】
Introduction
sql Performance optimization can help us optimize data query time , This article mainly introduces 10000w Data in sql After optimization, the query speed is improved 60 Multiple optimization process .
Text
There's a financial statement , Not divided into databases and tables , The current amount of data is 9555695, Paging query uses limit, Query time before optimization 16 s 938 ms (execution: 16 s 831 ms, fetching: 107 ms), Adjust as follows SQL after , Time consuming 347 ms (execution: 163 ms, fetching: 184 ms);
operation :
The query criteria are placed in the subquery , Subqueries only look up primary keys ID, Then use the primary key Association determined in the subquery to query other attribute fields ;
principle :
1、 Reduce the return operation ;
2、 May refer to 《 Alibaba Java Development Manual ( Taishan Edition )》 The fifth chapter -MySQL database 、( Two ) Index specifications 、 The first 7 strip :
【 recommend 】 Use delay association or subquery to optimize the super multi page scenario .
explain :
MySQL I didn't pick it offeset That's ok , It's about taking offset+N That's ok , And then back before giving up offset That's ok , return N That's ok , That's right offset When I was very old , Efficiency is very low , Or control the total number of pages returned , Or for the number of pages over a specific threshold SQL rewrite .
Example :
First, quickly locate what needs to be acquired id paragraph , And then relate :
SELECT a.* FROM surface 1 a,(select id from surface 1 where Conditions LIMIT 100000,20) b where a.id = b.id;
-- Before optimization SQLSELECT Various fields FROM `table_name`WHERE Various conditions LIMIT 0,10;
- After optimization SQLSELECT Various fields FROM `table_name` main_taleRIGHT JOIN (SELECT Subqueries only look up primary keys FROM `table_name`WHERE Various conditions LIMIT 0,10;) temp_table ON temp_table. Primary key = main_table. Primary key
One 、 Preface
Let's start with MySQL Version of :
mysql> select version();+-----------+| version() |+-----------+| 5.7.17 |+-----------+1 row in set (0.00 sec)
Table structure :
mysql> desc test;+--------+---------------------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+--------+---------------------+------+-----+---------+----------------+| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment || val | int(10) unsigned | NO | MUL | 0 | || source | int(10) unsigned | NO | | 0 | |+--------+---------------------+------+-----+---------+----------------+3 rows in set (0.00 sec)
id It is an auto increment primary key ,val Is a non unique index .
Pour in a lot of data , common 500 ten thousand :
mysql> select count(*) from test;+----------+| count(*) |+----------+| 5242882 |+----------+1 row in set (4.25 sec)
We know , When limit offset rows Medium offset When a large , There will be efficiency issues :
mysql> select * from test where val=4 limit 300000,5;+---------+-----+--------+| id | val | source |+---------+-----+--------+| 3327622 | 4 | 4 || 3327632 | 4 | 4 || 3327642 | 4 | 4 || 3327652 | 4 | 4 || 3327662 | 4 | 4 |+---------+-----+--------+5 rows in set (15.98 sec)
mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;+---------+-----+--------+---------+| id | val | source | id |+---------+-----+--------+---------+| 3327622 | 4 | 4 | 3327622 || 3327632 | 4 | 4 | 3327632 || 3327642 | 4 | 4 | 3327642 || 3327652 | 4 | 4 | 3327652 || 3327662 | 4 | 4 | 3327662 |+---------+-----+--------+---------+5 rows in set (0.38 sec)
In order to achieve the same goal , We usually rewrite it as follows :
The time difference is obvious .
Why did the above result appear ? Let's see select * from test where val=4 limit 300000,5; Query process of :
- Query the index leaf node data .
- According to the primary key value of the leaf node, query all the required field values on the cluster index .
It's similar to the picture below :

Like above , Need to check 300005 Secondary inode , Inquire about 300005 Data of secondary cluster index , Finally, filter out the results 300000 strip , Take out the last 5 strip .MySQL It takes a lot of randomness I/O On the query cluster index data , But there is 300000 Sub random I/O The query data will not appear in the result set .
Someone must have asked : Since it was indexed in the beginning , Why not follow the index leaf node to find the last needed 5 Nodes , Then query the actual data in the cluster index . It just needs 5 Sub random I/O, Similar to the process shown in the following picture :

In fact, I also want to ask this question .
confirmed
Now let's take a practical operation to confirm the above inference :
To confirm select * from test where val=4 limit 300000,5 It's a scan 300005 Index nodes and 300005 Data nodes on clustered indexes , We need to know MySQL Is there any way to count in a sql The number of times a data node is queried through an index node in . I tried first Handler_read_* series , Unfortunately, none of the variables can satisfy the condition .
I can only prove it indirectly :
InnoDB There is buffer pool. It contains recently accessed data pages , Including data pages and index pages . So we need to run two sql, To compare buffer pool Number of data pages in . The prediction is to run select * from test a inner join (select id from test where val=4 limit 300000,5); after ,buffer pool The number of data pages in is far less than select * from test where val=4 limit 300000,5; Corresponding quantity , Because of the previous one sql Only visit 5 Secondary data page , The second one sql visit 300005 Secondary data page .
select * from test where val=4 limit 300000,5
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;Empty set (0.04 sec)
It can be seen that , at present buffer pool There's nothing about test Table data page .
mysql> select * from test where val=4 limit 300000,5;+---------+-----+--------+| id | val | source |+---------+-----+--------+| 3327622 | 4 | 4 || 3327632 | 4 | 4 || 3327642 | 4 | 4 || 3327652 | 4 | 4 || 3327662 | 4 | 4 |+---------+-----+--------+5 rows in set (26.19 sec)mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;+------------+----------+| index_name | count(*) |+------------+----------+| PRIMARY | 4098 || val | 208 |+------------+----------+2 rows in set (0.04 sec)
It can be seen that , here buffer pool About China test Table has 4098 Data pages ,208 Index pages .
select * from test a inner join (select id from test where val=4 limit 300000,5) ; To prevent the effect of the last test , We need to empty buffer pool, restart mysql.
mysqladmin shutdown/usr/local/bin/mysqld_safe &
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;Empty set (0.03 sec)
function sql:
mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;+---------+-----+--------+---------+| id | val | source | id |+---------+-----+--------+---------+| 3327622 | 4 | 4 | 3327622 || 3327632 | 4 | 4 | 3327632 || 3327642 | 4 | 4 | 3327642 || 3327652 | 4 | 4 | 3327652 || 3327662 | 4 | 4 | 3327662 |+---------+-----+--------+---------+5 rows in set (0.09 sec)mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;+------------+----------+| index_name | count(*) |+------------+----------+| PRIMARY | 5 || val | 390 |+------------+----------+2 rows in set (0.03 sec)
We can see clearly the difference between the two : first sql To load the 4098 Data pages to buffer pool, And the second one. sql Only loaded 5 Data pages to buffer pool. In line with our prediction . It also confirms why the first sql Will be slow : Read a lot of useless data rows (300000), Finally, he abandoned .
And it creates a problem : Loaded a lot of hot, not very high data pages to buffer pool, Can cause buffer pool Pollution of , Occupy buffer pool Space .
Problems encountered
To make sure it's cleared every time you restart buffer pool, We need to close innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup, These two options control when the database is shut down dump Out buffer pool The data in the database and when the database is opened is loaded on the disk for backup buffer pool The data of .、
边栏推荐
- Lumiprobe 活性染料丨吲哚菁绿说明书
- ACM mm 2022 video understanding challenge video classification track champion autox team technology sharing
- Leetcode-128 longest continuous sequence
- 有关 M91 快速霍尔测量仪的更多信息
- 前4A高管搞代运营,拿下一个IPO
- Li Kou daily question - Day 32 -589 N × Preorder traversal of tree
- The R language uses the tablestack function of epidisplay package to make statistical summary tables (descriptive statistics based on the grouping of target variables, hypothesis testing, etc.). If th
- 解决方案:可以ping别人,但是别人不能ping我
- R language uses the DOTPLOT function of epidisplay package to visualize the frequency of data points in different intervals in the form of point graph, and uses PCH parameters to customize the shape o
- 苹果产品在日本全面涨价,iPhone13涨19%
猜你喜欢

Lumiprobe bifunctional crosslinker sulfo cyanine 5 bis NHS ester

Halcon image calibration enables subsequent image processing to become the same as the template image

华为游戏初始化init失败,返回错误码907135000

宏观视角看抖音全生态

Huawei cloud experts explain the new features of gaussdb (for MySQL)

Superoptimag superconducting magnet system - SOM, Som2 series

如何在自有APP内实现小程序实现连麦直播

6月刊 | AntDB数据库参与编写《数据库发展研究报告》 亮相信创产业榜单

【AGC】如何解决事件分析数据本地和AGC面板中显示不一致的问题?

Example explanation: move graph explorer to jupyterlab
随机推荐
Halcon image calibration enables subsequent image processing to become the same as the template image
Summary of cases of players' disconnection and reconnection in Huawei online battle service
JS find the next adjacent element of the number in the array
Lumiprobe lumizol RNA extraction reagent solution
Altair HyperWorks 2022软件安装包和安装教程
The R language cartools package divides the data, the scale function scales the data, the KNN function of the class package constructs the k-nearest neighbor classifier, and the table function calcula
数商云:从规划到落地,五矿集团如何快速构建数字化发展新格局?
R language ggplot2 visualization: gganimate creates a dynamic histogram animation (GIF), and displays the histogram and enter step by step along a given dimension in the animation_ Growth function and
lefse分析
精耕渠道共谋发展 福昕携手伟仕佳杰开展新产品培训大会
Lake Shore M91快速霍尔测量仪
Li Kou daily question - Day 32 -1232 Dotted line
Go language self-study series | go language data type
linux下清理系统缓存并释放内存
Livedata postvalue will "lose" data
数据库基础:select基本查询语句
【直播预约】数据库OBCP认证全面升级公开课
PriorityQueue的用法和底层实现原理
Evaluation of 6 red, yellow and black list cameras: who is the safest? Who has good picture quality? From now on, let you no longer step on thunder
Go语言自学系列 | go语言数据类型