当前位置:网站首页>Explain of SQL optimization

Explain of SQL optimization

2022-07-04 14:29:00 Dying fish

explain

id

id The number of the column is select The serial number of , Just look up a few tables id, also id The higher the value, the higher the execution priority . If id Same value , Just execute from top to bottom , Finally, execute id by null Of .

select_type

Query type , There are several values as follows
 Insert picture description here

Raise a point SUBQUERY Query depends on external examples

explain select * from  (select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.travel_no = erp_travel_detail.travel_no ) erp_travel
where erp_travel.travel_no in (
  select  max(erp_travel_cost.travel_no) from erp_travel_cost where erp_travel_cost.project_no = erp_travel.project_no
)
limit 1

 Insert picture description here

Parse the query process
1. See the execution sequence, execute first id by 3 Subquery of , Check the whole table directly , Full table IO
2. Query the data results of external query , Match the results of sub query row by row . External query Query the largest according to the conditions one by one travel_no
3. We know the Cartesian product of self query matching , namely primar Of size ride subquery Of size, If the data volume of the two queries is only over 10000 respectively , Then the number of cycles will be more than 100 million . therefore DEPENDENT SUBQUERY Try to avoid , Once it appears, there is basically no way to use sql.
SUBQUERY Example
Remove the association conditions in the sub query and the external query

explain select * from  (select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.travel_no = erp_travel_detail.travel_no ) erp_travel
where erp_travel.travel_no in (
  select  max(erp_travel_cost.travel_no) from erp_travel_cost where erp_travel_cost.is_manage is not null
)
limit 1;

 Insert picture description here

SUBQUERY The speed of execution is very fast , Because the subquery is only executed once , Generate a record , It is not associated with external query . There is no Cartesian product loop matching process at all .
What happens if it is an ordinary subquery
Aggregate subqueries max Get rid of , avoid DEPENDENT SUBQUERY

explain select * from  (select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.travel_no = erp_travel_detail.travel_no ) erp_travel
where erp_travel.travel_no in (
  select  erp_travel_cost.travel_no from erp_travel_cost where erp_travel_cost.project_no = erp_travel.project_no
)
limit 1

 Insert picture description here

Why is this query so fast , Let's analyze
after sql Optimizer optimization , It can be seen that the subquery has been optimized into a join query
 Insert picture description here

Why turn subquery into associated query ?


We know that subquery means the Cartesian product of external query and internal query , What about connection query . The connection method used for connection query is Nest Loop Join, For internal connections , Automatic optimization drives large tables for small tables
1. Subquery erp_travel_cost Full table IO Find out the results , As a driving table
2. Use join buffer Batch query the query result set outside the connection .
because peoject_no It's not an index , Both watches have 15w Left and right data , So this whole table connection , It must be a disaster . It's almost impossible to get the result


table

Indicates which table this row is accessing , If SQL It defines the alias , And show the alias of the table

partitions

The partition of the current query matching record . For tables that are not partitioned , return null
When partitioning, different data will be divided into corresponding blocks , In this way, the size of each piece becomes smaller , To speed up the IO Speed
Understand what partitions are MySQL Table partitioning ? Knowledge is up !

type

| ALL | Full table scan
| index | Index full scan
| range | Index range scan , Common words <,<=,>=,between Wait for the operation
| ref | Use non unique index Scan or unique index prefix scan , Returns a single record , It often appears in association queries
| eq_ref | similar ref, The difference is that you use a unique index , Use primary key Association query
| const/system | single record , Other columns in the matching row are treated as constants , Such as primary key or unique index query
| null | MySQL Do not access any tables or indexes , Direct return

ALL Full table scan

When the query condition has no index, you can go , Then it's the whole table io

explain select * from erp_travel_cost where is_manage is   null

 Insert picture description here

The least efficient

INDEX Index full table scan

explain select project_no from erp_travel_cost ; -- all


explain select travel_no from erp_travel_cost  -- index

 Insert picture description here

  • index: Full Index Scan, index And ALL The difference for index Type only traverses the index tree . This is usually better than ALL fast , Because index files are usually smaller than data files .( That is to say, although all and index Read the whole watch , but index It's read from the index , and all It's read from the hard disk )

    index It refers to scanning the full index , And there is no need to return the table , Or directly check the primary key index , Or directly check the secondary index

RANGE

  • range: Retrieve only rows in the given range , Use an index to select rows .key Column shows which index is used
    It's usually in your where In the sentence between、<、>、in And so on
    This range scan index scan is better than full table scan , Because he just needs to start indexing at a certain point , And the conclusion is another point , Don't scan all indexes

First range The requirement is to specify the range of the index , For non index, it must be a full table


explain select * from erp_travel_cost where erp_travel_cost.create_time > '2022-01-01 00:00:00' -- all

explain select * from erp_travel_cost where travel_no> '2' --range

 Insert picture description here

Because the index has its own order , It's easy to use B+ The tree determines the index range , If you need to return the form , Then the returned data from the index will be returned to the table for query according to the primary key , If there is a lot of data in the back table , In addition, the index also needs Io, Therefore, when the range is large, the full table query will be used directly

ref

  • ref: Non unique index scan , Returns a match All rows of individual values .
    In essence, it is also an index access , It returns all rows that match a single value , However ,
    It may find more than one eligible row , So it should be a mixture of search and scan
explain select * from erp_travel_cost where travel_no  = ( '1000127141') -- ref
explain select * from erp_travel_cost where travel_no in( '1000127141','1000127142')  --range

eq_ref

Read a row composed of each row in this table and the associated table . except 了 system and const Out of type , This is the best connection type . When links use all parts of the index , Index is primary key or unique non NULL When indexing , This value will be used .
eq_ref Available for use = Index column for operator comparison . The comparison value can be a constant or an expression of a column in a table previously read using this table . In the following example , MySQL Able to use eq_ref link (join)ref_table To deal with it :ui

explain select * from  app_user_copy1 left  join app_user  on app_user.id = app_user_copy1.id

 Insert picture description here

const

Indicates that it is found through index once ,const For comparison primary key perhaps unique Indexes . Because only one line of data is matched , So soon . If the primary key is set to where In the list ,MySQL You can convert the query to a constant

system

  • system: There is only one line in the table ( It's equal to the system table ), This is a const Special case of type , Not usually , This can also be ignored

index_merge

But my question today is , Two different secondary index trees , Will it take effect at the same time ? In theory , It should be effective at the same time , Otherwise, this MySQL It's stupid . It basically happens in or Conditionally

explain select * from erp_travel where project_no_form  = 'A21020028' or  user_no ='00022139'

 Insert picture description here

The indexes of both columns will be used

possible_keys

MYSQL The optimizer will find all the solutions that can be used to execute the statement , And after comparing these schemes, find out the scheme with the lowest cost . Here we will list the indexes that may be used
Show the indexes that may be applied to this table , One or more . If there is an index on the field involved in the query , Then the index will be listed , But it doesn't have to be actually used by the query .

key

Possible to use key There are many. , But the optimizer will choice Index to query , You don't have to choose only one index index_merge
key The list shows the index names that the query optimizer decides to use after calculating the cost of using different indexes , That is, the index column actually used . If null No index is used .
If an overlay index is used in the query , Then index and query select Fields overlap

key_len

key_len The list shows when the optimizer decides to execute a query using an index , The maximum length of the index record , It consists of these three parts :
① For index columns with fixed length types , The maximum length of the storage space actually occupied by it is the fixed value , For variable length index columns of the specified character set , For example, the type of an index column is varchar(255), The character set used is utf8(MySQL utf8 Character set usage 3 Byte length ), Then the maximum storage space actually occupied by this column is 255 × 3 + 2= 767 Bytes .
② If the index column can store NULL value , be key_len Can't store NULL When the value is more 1 Bytes by 758.
③ For variable length words (varchar) For example , There will be 2 A space of bytes to store the actual length of the variable length column .

The union index can also be through key_len You can see how many columns of joint indexes are used , The more columns you use , You can filter more by index , Reduce io frequency

explain select * from erp_travel where user_no = '00022139'  -- key_len 768
explain select * from erp_travel where user_no = '00022139' and user_name = '00022139' --key_len 1536
explain select * from erp_travel where user_no = '00022139' and user_name = '00022139' and creater = '00022139' --key_len 2304

ref

When a query is executed using the condition of index column equivalence matching , That is, the access method is const、eq_ref、ref、ref_or_null、unique_subquery、index_subquery One of them ,ref The column shows the conditions for equivalent matching with the index column , Such as a constant or a column .

explain select * from erp_travel where   user_no ='00022139'

 Insert picture description here
Here is the index column student_name What makes equivalence matching is a specific string , Is constant , therefore ref This column shows const

explain select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.travel_no = erp_travel_detail.travel_no

 Insert picture description here

erp_travel_detail When doing equivalent matching erp_travel Of travel_no Field , So here it shows erp_travel.travel_no

rows

According to table statistics and index selection , Roughly estimate the number of rows to read to find the required record . This can be regarded as the most important indicator , Because the ultimate goal of all optimization is to reduce the number of lines scanned at last , That is to reduce rows This value .
rows How exactly is it calculated ?
This rows The explanation in the official website document is as follows :

rows (JSON name: rows)
The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For [InnoDB] tables, this number is an estimate, and may not always be exact.
http://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain_rows
A simple translation is : This method is only based on the given conditions about the index and the index itself , To determine how many lines need to be scanned .
summary
MySQL Explain Inside rows This value

  • yes MySQL The number of lines that it is supposed to check ( Just for your reference ), Not the number of rows in the result set ;
  • meanwhile SQL Inside LIMIT It has nothing to do with this .

in addition , Many optimization methods , For example, association buffer and query cache , Can't affect rows Display of .MySQL Maybe you don't have to really read all the lines it estimates , It also doesn't know any information about the operating system or hardware cache .
This rows The value of index is to see the role of index in query , Is it reasonable? Only make a rough estimation based on the index of query or connection conditions . If there is no index, it must be valued as a full table , Even with Limit


explain select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.travel_no = erp_travel_detail.travel_no 

 Insert picture description here

First erp_travel As a driving table , A full table query , Estimate rows Is the number of rows in the whole table
Then according to the index connection conditions ,travel_no Connect erp_travel_detail, Estimate according to the index , Only one row of data is needed to match the index , But it must be more than that ,

filtered

MySql explain In the return result of the statement ,filtered How to understand fields ?

MySql5.7 The official documents are as follows :

The filtered column indicates an estimated percentage of table rows filtered by the table condition. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering. rows shows the estimated number of rows examined and rows × filtered shows the number of rows joined with the following table. For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows to be joined with the following table is 1000 × 50% = 500.

This passage is not easy to understand , for instance , There are three query statements explain result , in the light of b and c The display of the table filtered yes 100, And for the a The display of the table is 18.

+-------------+-------+--------+---------+---------+------+----------+
| select_type | table | type   | key     | key_len | rows | filtered |
+-------------+-------+--------+---------+---------+------+----------+
| PRIMARY     | a     | range  | search  | 4       |  174 |   18.00  |
| PRIMARY     | b     | eq_ref | PRIMARY | 4       |    1 |   100.00 |
| PRIMARY     | c     | ALL    | PRIMARY | 4       |    1 |   100.00 |

How can we understand filtered The value of ? from filtered What conclusion can be drawn from the value of ? What is the 100 Better still 18 Better ?
First , there filtered Indicates the number of final record lines obtained by query criteria type The percentage of record lines searched by the search method indicated in the field .
The first statement in the figure above is an example ,MySQL First use the index ( there type yes range) Scan table a, Expected to receive 174 Bar record , That is to say rows Number of records displayed in column . Next MySql Additional query criteria will be used for this 174 Line records are filtered twice , Finally, we get 32 Bar record , That is to say 174 Bar record 18%. and 18% Namely filtered Value .
In a more perfect situation , It should use an index , Direct search 32 Records and filter out other 82% The record of .
So one is lower filtered Value indicates that a better index is needed , If type=all, It means to get by full table scanning 1000 Bar record , And filtered=0.1%, Only 1 Records match the search criteria . At this point, if you add an index, you can search it directly 1 Data , that filtered Can be promoted to 100%.
thus it can be seen ,filtered=100% It's really better than 18% It is better to .

Of course ,filtered Not everything , It is more important to focus on the values of other columns in the execution plan results and optimize the query . For example, in order to avoid filesort( Use can satisfy order by The index of ), Even if filtered There is no problem that the value of is low . Another example is above filtered=0.1% Scene , We should pay more attention to adding an index to improve query performance , Instead of looking at filtered Value .

Extra

Extra Columns are used to illustrate some additional information , We can use this additional information to understand more accurately MySQL How the given query statement will be executed .MySQL Dozens of additional information were provided , Not one by one , Here are some common additional information descriptions .
① Using filesort: If you sort by index column (order by Index columns ) Index can be used ,SQL The query engine will first sort according to the index column , Then get the primary key of the corresponding record id Perform a return operation , If the sorting field does not use an index, it can only be sorted in memory or disk ,MySQL This method of sorting in memory or on disk is collectively referred to as file sorting ( English name :filesort), If a query needs to be executed by file sorting , Will be carrying out the plan Extra Column shows Using filesort

explain  select * from erp_travel_detail order by travel_no limit 1000

 Insert picture description here

Sort by index , Generally not used filesort But if the amount of data is large , Will not take the miniature
hold 1000 Change to 10000

explain  select * from erp_travel_detail order by travel_no limit 10000

 Insert picture description here

Even follow up index sorting , Still using filesort
② Using temporary
Using temporary Indicates that there is no index due to sorting 、 Use union、 Subquery join query 、 Use some views and other reasons ( See internal-temporary-tables), Therefore, an internal temporary table is created . Note that the temporary table here may be a temporary table in memory , It may also be a temporary table on the hard disk , Of course, the time consumption of memory based temporary tables is certainly smaller than that of hard disk based temporary tables .
The temporary table is also easy to understand , According to sql Deal with a result first , According to this result, we can get the final result . Then the intermediate result is the intermediate table

mysql> show global status like '%tmp%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 5     |
| Created_tmp_tables      | 11    |
+-------------------------+-------+
3 rows in set

explain select updater   from  erp_travel
group by updater 

 Insert picture description here

explain select travel_no   from  erp_travel
group by travel_no 

Go to the index . Index comes with order , Directly according to the index, you can group on an orderly basis , Instead of sorting , Create a temporary table
 Insert picture description here

explain  select distinct updater from  erp_travel;

Non index , Created a temporary table
 Insert picture description here

explain  select distinct updater from  erp_travel where updater  = '00022139';

 Insert picture description here

There is no need to establish a temporary table for processing with a small amount of data

explain  select distinct creater from  erp_travel

 Insert picture description here

For federated indexes , Even if it does not match the leftmost match , This can also be reduced by indexing io frequency .
③ USING index: It means corresponding select The override index is used in the operation (Covering Index), Avoid the operation of returning tables , Good efficiency !
If it appears at the same time using where, Indicates that the index is used to perform index key value lookups ; If not at the same time using where, The table name index is used to read data rather than perform lookup actions .
④ Using where: Used where Filter
⑤ using join buffer: During the execution of the join query , When the driven table can not effectively use the index to speed up the access speed ,MySQL It's usually allocated a piece called join buffer To speed up the query

explain select erp_travel.*,erp_travel_detail.destination from  erp_travel left join erp_travel_detail on erp_travel.user_no = erp_travel_detail.creater

 Insert picture description here

The connection condition is not an index ( use join buffer)
⑥ impossible where: where The value of the clause is always false, Can't be used to get any tuples
⑦ select tables optimized away: In the absence of GROUPBY In the case of clause , Index based optimization MIN/MAX Operation or for MyISAM Storage engine optimization COUNT(*) operation , You don't have to wait until the execution phase to do the calculation , The query execution plan generation phase completes the optimization .

explain  select max(travel_no) from  erp_travel

 Insert picture description here

⑧ distinct: Optimize distinct, Stop looking for the same value after finding the first matching tuple

⑨ Using index condition: Search uses index , But you need to return the table to query the data

explain select * from erp_travel_cost where travel_no in( '1000127141','1000127142')

 Insert picture description here

原网站

版权声明
本文为[Dying fish]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207041213469189.html

随机推荐