当前位置:网站首页>The cost of returning tables in MySQL

The cost of returning tables in MySQL

2022-07-07 01:08:00 Xiu Qiang

The price of returning the watch

For the following query statement :
SELECT * FROM single_ table WHERE key1 > 'a' AND key1 < 'c';
We can choose the following two ways to execute .

  • Execute the query in the way of full table scanning
    That is, directly scan all clustered index records , For each clustered index record , To determine whether the search criteria are valid , If yes, send it to the client , Otherwise, skip the record .
  • Use idx_key1 Execute the query
    According to the search criteria key1>a’ AND key1 < ‘c’ Get the corresponding scanning interval (‘a’,‘c’), Then scan the secondary index records in the scanning interval . because idx_key1 The leaf node of the index stores incomplete user records , Contains only key1、id These two columns , The query list is *, This means that we need to get the cluster index record corresponding to each secondary index record , That is, perform the operation of returning to the table , After obtaining the complete user record, it will be sent to the client .

For the use of InnoDB For the table of the storage engine , All data pages in the index must be stored on disk , Wait until you need to load it into memory to use . These data pages will be stored in one or more files on the disk , The page number of the page corresponds to the offset of the page in the disk file . With 16KB Size pages as an example , Page No 0 The page of corresponds to the offset of 0 The location of , Page No 1 The page opposite these files has an offset of 16KB The location of .
B+ The nodes of each layer of the tree will be connected using a two-way linked list , The page numbers of the previous node and the next node do not need to be adjacent . But in practice ,InnoDB Try to arrange the page numbers of leaf nodes of the same index in order .
in other words ,idx_key1 In scan interval (‘a’,‘e’) The page number of the page where the secondary index record is located in will be as adjacent as possible . Even if the page numbers of these pages are not adjacent , But at least one page can store many records , That is to say, after executing a page I/O after , You can load many secondary index records from disk into memory . To make a long story short , Reading is in the scanning range (‘a’,‘e’) When the secondary index record in , The price paid is still small . But the scan range (‘a’,‘e’) Corresponding to the secondary index record in id The size of the value is irregular , Every time we read a secondary index record , You need to record according to the secondary index id Value to cluster index to perform back to table operation . If the page of the corresponding clustered index record is not in memory , You need to load the page from disk to memory . Due to a lot of reading id Clustered index records with discontinuous values , Moreover, these clustered index records are distributed in different data pages , The page numbers of these data pages are also irregular , Therefore, it will cause a lot of random I/O.
The more records that need to perform a table back operation , The lower the performance of queries using secondary indexes , Some queries prefer to use a full table scan rather than a secondary index . such as , hypothesis key1 Values in 'a’~’c’ The number of user records between accounts for 99% above , If you use idx_key1 Indexes , Will have a 99% The above id The value needs to be returned to the table . It's better to perform a full table scan directly .
When executing the query , When to use full table scanning , When to use secondary index + How to return to the table ? This is it. Query optimizer What should be done . The query optimizer will calculate some statistics for the records in the table in advance , Then use these statistics or access a small number of records in the table to calculate the number of records that need to be rowed back to the table . If you need to perform a table back operation, the more records , The more likely you are to use full table scanning , On the contrary, they tend to use secondary indexes + Back to the table . Of course , The analysis work done by the query optimizer is not so simple , But it's basically such a process .
In general , You can specify... For the query statement LIMIT Clause to limit the number of records returned by the query , This may make the query optimizer prefer to use secondary indexes + Query by returning to the table , The reason is that there are fewer records back to the table , The higher the performance . such as , The above query statement can be rewritten as follows :
SELECT * FROM single_table WHERE key1 >'a' AND key1 < 'c' LIMIT 10;
Added LIMIT 10 The query statement after clause makes it easier for the query optimizer to adopt secondary index + Back to the table .
For queries that need to sort the results , If the secondary index is used to execute the query, there are many records that need to perform the operation of returning to the table , Also prefer to use full table scanning + Execute the query in the way of file sorting . For example, the following query statement :
SELECT * FROM single_table ORDER BY key1;
Because the query list is *, If you use secondary index to sort , You need to perform a table back operation on all secondary index records . The cost of this operation is not as low as directly traversing the cluster index and then sorting the files , Therefore, the query optimizer will tend to use full table scanning to execute queries . If you add LIMIT Clause , For example, the following query statement :
SELECT * FROM single_table ORDER BY key1 LIMIT 10;
This query statement requires very few records to perform the operation of returning to the table , The query optimizer will tend to use secondary indexes + Back to the table .

原网站

版权声明
本文为[Xiu Qiang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207061721405047.html