当前位置：网站首页>MySQL advanced - index optimization (super detailed)

MySQL advanced - index optimization (super detailed)

2022-06-30 18:10:00 【Noblegasesgoo】

Performance analysis

MySQL Query Optimizer

Mysql In particular Responsible for optimizing SELECT Statement optimizer , The main function is By calculating the statistics collected in the analysis system , Requested for client Query Provide the best execution plan he thinks （ He thinks the best , but ** Is not necessarily DBA Feel the best , This part takes the most time **）.

When the client MySQL Ask for one Query When , The command parser module completes the request classification , The difference is SELECT And forward it to MySQL Query Optimizer when ,MQO For the whole Query To optimize , Deal with the budget of some constant expressions , Convert it to a constant value and correct the ,Query Simplification and transformation of query conditions in , Such as Remove some useless or obvious conditions 、 Structural adjustment etc. , Then analysis Query Medium Hint Information （ If any ）, Look at the display Hint Information Is it possible to be completely certain that Query Implementation plan of . If No, Hint or Hint The information is not enough to fully determine the implementation plan , will Read the statistics of the object involved , according to Query Carry out corresponding calculation and analysis , Let's get the final implementation plan .

MySQL Common bottlenecks

CPU

CPU In the bag and It usually occurs when data is loaded into memory or read from disk When

disk I/O Where is the bottleneck When the loaded data is much larger than the memory capacity .

Server hardware

The performance bottleneck of hardware ：top,free,iostat and vmstat To see the performance status of the system .

Explain（ a key ）

What is it?

Use EXPLAIN keyword Sure simulation Optimizer perform SQL Query statement , Never know MySQL How to deal with you SQL Of the statement , Analyze the performance bottleneck of your query statement or table structure .

stay MySQL 5.7 in , Merge and optimize the derived tables , If you want to view it intuitively select_type Value , The function needs to be turned off temporarily （ The default is on ）

--  Turn off the merge optimization of derived tables 
set global optimizer_switch='derived_merge=off';

--  Turn off the merge optimization of derived tables （ Valid only for this session ）
set session optimizer_switch='derived_merge=off'; 

--  Open the merge optimization of the derived table （ Valid only for this session ）
set session optimizer_switch='derived_merge=on';

What can I do?

Read order of tables
Which indexes can be used
Operation type of data read operation
Which indexes are actually used
References between tables
How many rows per table are queried by the optimizer

How do you play?

Explain + SQL Sentence can be used .

Insert picture description here

Analysis of each field

Of the test case sql

 CREATE TABLE t1(id INT(10) AUTO_INCREMENT,content  VARCHAR(100) NULL ,  PRIMARY KEY (id));
 CREATE TABLE t2(id INT(10) AUTO_INCREMENT,content  VARCHAR(100) NULL ,  PRIMARY KEY (id));
 CREATE TABLE t3(id INT(10) AUTO_INCREMENT,content  VARCHAR(100) NULL ,  PRIMARY KEY (id));
 CREATE TABLE t4(id INT(10) AUTO_INCREMENT,content  VARCHAR(100) NULL ,  PRIMARY KEY (id));
 INSERT INTO t1(content) VALUES(CONCAT('t1_',FLOOR(1+RAND()*1000)));
  INSERT INTO t2(content) VALUES(CONCAT('t2_',FLOOR(1+RAND()*1000)));
  INSERT INTO t3(content) VALUES(CONCAT('t3_',FLOOR(1+RAND()*1000)));
  INSERT INTO t4(content) VALUES(CONCAT('t4_',FLOOR(1+RAND()*1000)));

id Field

select The serial number of the query , Contains a set of numbers , Express Execute in query select The order of clauses or operation tables .

Situation 1 ：id identical

id identical , The order of execution is from top to bottom

EXPLAIN SELECT * 
FROM t1, t2, t3
WHERE 
    t1.id = t2.id 
    AND t2.id = t3.id;

Insert picture description here

In this case , Yes where First statement after t1.id = t2.id adopt t1.id relation t2.id, and t2.id The result is Based on the t2.id = t3.id On the basis of .

It may also occur that the execution order is t1 -> t3 -> t2 The situation of .

Situation two ：id Different

EXPLAIN SELECT t2.* 
FROM t2 
WHERE id = (SELECT t1.id            
            FROM t1            
            WHERE id = (
                SELECT t3.id                        
                FROM t3));

Insert picture description here

From the results we can see that , If it is Subquery ,id The serial number of will increase ,id The bigger the value is. , The higher the priority , The first to be executed .

It can be understood as , Priority of parentheses .

id Increasing -> id The bigger the value is. -> The higher the priority -> The first to be executed

Situation three ：id Same and different , At the same time

EXPLAIN SELECT t2.* 
FROM (SELECT t3.id
      FROM t3) AS s1, t2
WHERE s1.id = t2.id;

Insert picture description here

id If the same , It can be thought of as a group , Execute from top to bottom in the group , So in all groups ,id The higher the value, the higher the priority , The group with higher priority , Execute first .

The result once shocked me , Because there is no derived table . I went to check , The occurrence of derived tables is not a good query scheme , A derived table is essentially a temporary table , If it is too large, you may have to create it on disk instead of in memory , Such creation and access will consume IO Resources , May be in MySQL5.7 Optimizer in , Will automatically help optimize derived tables , Merge derived tables into outer queries .

such as ：

--  Before optimization 
SELECT * FROM (SELECT * FROM t1) AS s1;

--  After optimization 
SELECT * FROM t1;

select_type

Type of query , It is mainly used for Distinguish ordinary query , The joint query , Complex queries such as subqueries .

What are they?

That's all ：
Insert picture description here

Let's take a look one by one .

SIMPLE

ordinary SELECT Inquire about , The query does not contain subqueries or UNION

Insert picture description here

PRIMARY
If the query contains any Complex sub parts , The outermost query is marked PRIMARY, The last one .

Insert picture description here

5.7 Version of the optimizer directly optimizes , Here we use other people's pictures .

DERIVER

stay FROM The subqueries contained in the list are marked with DERIVERD（ The derived ）MySQL These subqueries will be executed recursively , Put the results in the provisional table .

Insert picture description here

SUBQUERY

stay SELECT or WHERE Package in the list Contains subqueries .

Insert picture description here

DEPENDENT SUBQUERY（5.7 There is no ）

stay SELECT or WHERE The list contains subqueries , Subqueries are based on the outermost layer

Insert picture description here

DEPENDENT SUBQUERY And SUBQUERY The difference between

Dependent subqueries ： The subquery result is multivalued | Subquery ： The query result is single value .

UNCACHEABLE SUBQUREY（5.7 There is no ）

Subqueries that cannot be cached , It is estimated that the temporary table is too large .

UNION

If the second SELECT Appear in the UNION after , Is marked as UNION;

if UNION Included in FROM Clause , The outer SELECT Will be marked as DERIVED

Insert picture description here

UNION RESULT

Use union A temporary table will be generated after keywords , For this temporary table select .

Insert picture description here

table

Table corresponding to row query .

When from If there is a subquery in the clause ,table List as deriverN The format of , Indicates that this line executes id = N Query of row .
When there is union when ,table The data is <union M,N> The format of , M and N To participate in union Of select That's ok id.

type（ It's very important ）

type Indicates the association type of this row query （ Access type , Or query type ）, Through this value, you can know the approximate range of query data records in this row .

Its value is related to whether we have optimized SQL Is closely linked .

The common values from the best to the worst are ：system > const > eq_fef > ref > range > index > all;

In general , We want to To ensure efficiency , We should optimize our statements to at least range Level , If possible, it is better to optimize to ref; range Level is generally used for range lookup , So in other words , In addition to range lookup , We Other query statements should be optimized to ref Level .

What are they?

These are the only common ：

Insert picture description here

Now let's take a look at how many situations will occur one by one ：

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-MKnFTanX-1644829484203)(2021-11-27- Index optimization .assets/image-20211127145716128.png)]

NULL
- Express MySQL Be able to decompose query statements in optimization phase , There is no need to access tables and indexes during execution .
system / const
- MySQL It can optimize a query part and convert it into a constant （ Can pass show warnings View the optimization results ）, It is mainly used to query the primary key （primary key） Or a unique index （Unique Key） Corresponding records , Because there is no repetition , So you can only query one record at most , So it's faster .system yes const The special case of , When When there is only one record in the temporary table, it is system.

Insert picture description here

eq_ref
- Unique index scan , For each index key , Only one record in the table matches it , Common in primary key or unique index scan .
ref
- Non unique index scan , Returns a match All rows of individual values , In essence, it is also an index access , It returns all rows that match a single value , However , It may find more than one eligible row , So it should be a mixture of search and scan .

Insert picture description here

range
- Retrieve only rows in the given range , Use an index to select rows ,key Column shows which index is used It's usually in your where In the sentence between、<>、in And so on .
- such Range scanning is better than full table scanning , Because it only Need to start at some point in the index , And it ends at another point in the index , Don't scan all indexes .

Insert picture description here

index
- Full Index Scan,index And ALL The difference for index Type only traverses the index tree . This is usually better than ALL fast , Because index files are usually smaller than data files .
- That is to say, although all and index Read the whole watch , however index Read from index ,all It's read from the hard disk .
all
- Full Table Scan, Will traverse the entire table to find the matching rows .
index_merge
- In the query process, we need to use multiple indexes in combination , It usually occurs when there is or Keywords sql in .

Insert picture description here

ref_or_null
- For a field, both association conditions are required , Also needed null When it's worth it , The query optimizer will choose to use ref_or_null Link query .
- Compare with the above , Primary key id In general, it can't be null So it's not ref_or_null.
index_subquery
- Using indexes to associate subqueries , No longer scan the full table .

Insert picture description here

unique_subquery
- The connection type is similar to index_subquery. Unique index in subquery .

possible_keys

Show the indexes that may be applied to this table , One or more .
If there is an index on the field involved in the query , Then the index will be listed , But it doesn't have to be actually used by the query .

key

Actual index used . If null, No index is used
If an overlay index is used in the query , Then the index and query select Fields overlap .
- in other words , When querying a field , And that field has a corresponding index ,key The value of is displayed as an index , instead of null.

key_len

Represents the number of bytes used in the index , adopt This column calculates the length of the index used in the query , stay Without loss of accuracy , Usually the less the better .
It shows the maximum possible length , and Not necessarily the actual length used .
That is, it is calculated according to the table definition rather than through intra table retrieval .
key_len Field can help you check whether you make full use of the index .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-raWpuC5Y-1644829484207)(2021-11-27- Index optimization .assets/image-20211127160713103.png)]

ref

Shows which column of the index is used , If possible , It's a constant .
Which columns or constants are used to find values on the index .

Insert picture description here

rows

rows Columns show MySQL The number of rows that it must check to execute the query .

Extra（ important ）

contain Not suitable for displaying in other columns however Very important additional information .

Using filesort
- explain MySQL Will use a for the data External index sort , Instead of reading according to the index order in the table .
- MySQL in A sort operation that cannot be done with an index is called “ Sort in file ”.
- This situation is tantamount to a narrow escape .
- For example, composite index , Only one field of the composite index is used , So it can't be sorted , Will appear .

Insert picture description here

Using temporary
- Temporary tables are used to save intermediate results ,MySQL Use temporary tables when sorting query results .
- It is common in sorting order by And group queries group by.
- To be in Use index in grouping , You also have to follow the order in which indexes are defined , No castles in the air , Otherwise, it is easy to cause this Extra, By the way, bring another one on top Extra

Insert picture description here

Using index
- It means corresponding select Used in the operation Overlay index (Covering Index), Avoid the data row of the table , Good efficiency ！
- If it appears at the same time using where, indicate The index is only used to read data and Non index lookup .
- Overlay index
  - Simply put, it's me ** The fields of the built Composite Index , Exactly all the fields I'm looking for , And in the same order , It's actually an index scan INDEX**.
  - The index is efficient to find that That's ok One way , however General databases can also use indexes to find one Column The data of , So it doesn't have to read the entire line , After all, index leaf nodes store the data they index , When you can get the desired data by reading the index , Then there is no need to read .
  - ① An index ② Containing or covering select Clause and query condition where clause ③ All the required fields are called overlay indexes .
  - SELECT id , name FROM t_xxx WHERE age = 18;
  - There is a composite index idx_id_name_age_xxx Contains id,name,age Three fields . When querying, the indexed columns are directly retrieved , Instead of looking for other data in the row , More efficient .
  - It feels that it can be used for a large amount of data , And this kind of index can be used for many fixed field queries .
  - Be careful ： If you want to Use overlay index , Be sure to pay attention to select list Only the required columns are extracted from the , And the columns are included in the coverage index . Can not be select *, If all fields are indexed together, the index file will be too large , Query performance degradation .

Insert picture description here

Using where
- indicate Used where Filter .
using join buffer
- indicate Connection caching is used .
- Occurs when two tables are joined ,
  - The driver table （join Which side is the former simplicity , Which side is the drive table ,inner join The table with less data ） Without index , Indexing the driver table solves this problem , also type Will become ref.
- join More , In the configuration file join buffer You can turn it up a little bit .
impossible where
- indicate where The value of the clause is always false, Can't be used to get any tuples .
select tables optimized away
- In the absence of GROUP BY In the case of clause , Index based optimization MIN/MAX Operation or for MyISAM Storage engine optimization COUNT(*) operation , You don't have to wait for the execution phase to calculate , The query execution plan generation phase completes the optimization .

Example

Insert picture description here

Running order ？

t2 -> t1 -> t3 -> <derived3> -> <union1, 4>

Insert picture description here

Index optimization

Optimization analysis

Performance degradation SQL slow , Long execution time , The reasons for the long waiting time are as follows ：

The query statement is poorly written
- No index , Too many subqueries .
Index failure
- It was indexed but not used , Or not indexed .
Too many associated queries join
- It may be caused by the design defect of the database , Or a last resort .
- Often cause Long execution time .
Server tuning and parameter settings
- buffer 、 Number of threads, etc .
- It often leads to the maximum number of concurrency The waiting time is long .

Single table optimization

Build table SQL

CREATE TABLE IF NOT EXISTS `article` (
`id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`author_id` INT(10) UNSIGNED NOT NULL,
`category_id` INT(10) UNSIGNED NOT NULL,
`views` INT(10) UNSIGNED NOT NULL,
`comments` INT(10) UNSIGNED NOT NULL,
`title` VARBINARY(255) NOT NULL,
`content` TEXT NOT NULL
);

INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');

SELECT * FROM article;

Case realization SQL

--  Inquire about  category_id  by  1  And  comments  Greater than  1  Under the circumstances ,views  The most  article_id
SELECT id,author_id FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1;

Optimization starts

EXPLAIN SELECT `id`,`author_id` FROM `article` WHERE `category_id` = 1 AND `comments` > 1 ORDER BY `views` DESC LIMIT 1;

Insert picture description here

We can see , At this point we type yes ALL( Full table scan ), And it is a simple single table query , however Using filesort 了 , Put it all together , This is the worst case .

We can index query types to range Direction depends ;

--  Build composite index 
ALTER TABLE `article` ADD `index` idx_article_ccv(`category_id`,`comments`,`views`);

We can see that it has not been eliminated Using filesort however range It is already within the acceptable range .

Insert picture description here

Why do we all have indexes , But it's still file sorting , There is no index sort ？
- Because the BTree How index works , Prioritize category_id Reordering comments, If the same comments Then reorder view,
- Now our SQL In the sentence comments Field in Composite index of In the middle ,
- here comments > 1 The condition is a Range （ So it is range）,
- therefore MySQL We can't use the index to look at the following views Partial search , namely range Invalid index after type query field .

After the analysis, let's try to build a composite index across the fields that need a range query .

--  Delete the first index 
DROP INDEX idx_article_ccv ON article;

--  Create a new index 
CREATE INDEX idx_article_cv ON article(`category_id`,`views`);

The last analysis ：

Insert picture description here

It can be seen that this is the result we want most ,type = ref,ref = const,Extra Medium Using filesort It's gone .

Association query optimization

Build table SQL

CREATE TABLE IF NOT EXISTS `class` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
);

CREATE TABLE IF NOT EXISTS `book` (
`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`bookid`)
);

CREATE TABLE IF NOT EXISTS `phone` (
`phoneid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`phoneid`)
);

INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO phone(card) VALUES(FLOOR(1 + (RAND() * 20)));

Case a ： Double table

Case realization SQL

--  Double table 
SELECT * FROM class LEFT JOIN book ON class.card = book.card;

Optimization starts

--  Analyze the following statements 
EXPLAIN SELECT * FROM class LEFT JOIN book ON class.card = book.card;

Insert picture description here

It's coming straight ALL and Using join buffer, This is what we do not want , We are heading for ref | range as well as const To optimize .

Try adding an index to eliminate ALL.

ALTER TABLE `book` ADD INDEX idx_c(`card`);

Insert picture description here

We can see that the second line is ref and Eliminated Using join buffer 了 , But the first line is still ALL. This is from The left connection feature determines .LEFT JOIN Conditions apply to Determine how to search rows from the right table , There must be... On the left , So on the right is our key point , There must be an index .
Let's see if there is any way to divide the first line ALL, Try... In the left table card Field creation index .
```
ALTER TABLE `class` ADD INDEX idx_c(`card`); 
```
Effect grouping , Successfully optimized the first line of ALL by index , Can accept .

Insert picture description here

But generally we Just optimize to the third step , Multiple indexes may also take up space , Try to avoid excessive consumption of space resources , because rows The indicators are the same , We might as well save more space .

Case 2 ： Three watches

Case realization SQL

--  Delete the redundant index first 
DROP INDEX idx_c ON class;
DROP INDEX idx_c ON book;

--  This case is to be analyzed SQL
SELECT * FROM class LEFT JOIN book ON class.card = book.card LEFT JOIN phone ON book.card = phone.card;

Optimization starts

First analyze this statement

EXPLAIN SELECT * FROM class LEFT JOIN book ON class.card = book.card LEFT JOIN phone ON book.card = phone.card;

Insert picture description here

It can be seen that , This is a normal statement that has not been optimized , It can be like a double watch , Optimize according to that set of rules .

We start adding indexes to two right tables that participate in the corresponding left join

--  Add a new index 
ALTER TABLE `book` ADD INDEX B (`card`);
ALTER TABLE `phone` ADD INDEX P (`card`);

Optimized to an acceptable level .

Insert picture description here

MySQL Of FILESORT Sort （ a key ）

One way sorting

What is it? ？

from ** disk Read all the columns required by the query **, according to ORDER BY Column stay buffer Sort them out , Then scan the sorted list for output , its Be more efficient , Avoid reading data for the second time . also Put random IO It becomes a sequence IO, But he Will use more space , Because it Save each line in memory 了 .

Two way sorting

What is it? ？

MySQL 4.1 Before that, we used two-way sorting , Literally Scan the disk twice , And finally get the data . Read Row pointer and ORDER BY Column , Sort them out , Then scan the ordered list , Re read the corresponding data output from the list according to the value in the list .

Take sort field from disk , stay buffer Sort , Take other fields from disk .

Compare

Multiple sorting , You need to sort by disk , So take the data 、 When you're ready to get the data , Twice IO operation , It will be slow .
One way sorting , Store the arranged data in memory , One time is omitted IO operation , So it will be faster , however You need enough memory on your computer .

Conclusion and question

One way sorting is the last out , So the overall performance is better than two-way sorting .

But in sort_buffer in , Method B Than Method A Take up a lot of space , because Method B yes Take out all the fields ,, therefore ** It is possible that the total size of the retrieved data exceeds sort_buffer The capacity of , Cause only access sort_buffer Data of capacity size **, Sort （ establish tmp file , Multiplex merge ）, Take it after you've finished sort_buffer Capacity size , Arrange again …… So many times I/O.

I wanted to save it once I/O operation , Instead, it leads to a lot of I/O operation , It's not worth it .

So we need to transport peacekeepers at this time DBA appear Adjust according to the business MySQL In the database configuration file sort_buffer Size .

ORDER BY Optimize

MySQL Support two ways of sorting ,FileSort and Index,Index Efficient , It means MySQL Scan index itself to complete sorting .FileSort The way is less efficient .
We are starting to optimize order by Before clause , We need to know what stage we are going to optimize , about ORDER BY Sort , Try to optimize to INDEX The best way is to sort , avoid FILESORT.

case analysis

Build table SQL

CREATE TABLE tblA(
  id int primary key not null auto_increment,
  age INT,
  birth TIMESTAMP NOT NULL,
  name varchar(200)
);

INSERT INTO tblA(age,birth,name) VALUES(22,NOW(),'abc');
INSERT INTO tblA(age,birth,name) VALUES(23,NOW(),'bcd');
INSERT INTO tblA(age,birth,name) VALUES(24,NOW(),'def');

CREATE INDEX idx_A_ageBirth ON tblA(age,birth,name);

SELECT * FROM tblA;

Situation 1

Insert picture description here

The first sentence ： The leader is in , however Sort according to the younger brother , Big brother is in the back , The order is out of order , appear filesort.

The second sentence ： The leader is in , And the eldest brother and the second younger brother .

The third sentence ： The leader is in , And use the big brother that has been indexed to sort .

The fourth sentence ： The leader is in , But use younger brother sorting , be not in where The index fields that appear in the clause are order by An occurrence in a clause will invalidate the index , appear filesort.

Situation two

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-BkC2lUlk-1644829484211)(2021-11-27- Index optimization .assets/D2AB50A2-757C-44A6-9E60-8FC4AF33DF97.png)]

The first sentence ： Index sort invalid , Because it doesn't start with big brother .

The second sentence ： Index sort invalid , Because it doesn't start with big brother .

The third sentence ： Index sorting succeeded , because WHERE Clauses come in indexed field order and are not invalidated until they are scoped , So there is no full table scan , also ,ORDER BY The clause begins with big brother .

The fourth sentence ： Index sort invalid , Because the index is sorted by default , Then you force the reverse order , Can cause indexes to fail .

Case summary

about ORDER BY Sort , Try to optimize to INDEX The best way is to sort , avoid FILESORT.
Sort as much as possible on the index columns , Follow the best left prefix rule for indexes .
If ** Not in index column On ,filesort There are two algorithms ：mysql It's about to start Two way sorting and One way sorting **.
ORDER BY There are two cases , Will use INDEX Sort by ：
- ORDER BY sentence Use the leftmost column of the index .
- Use WHERE Clause and ORDER BY Clause The condition column combination satisfies Index leftmost prefix rule .
- WHERE Clause if index range query appears （ namely explain It appears that range It can lead to ）order by Index failure .

GROUP BY Optimize

Let's start with an example ,GROUP BY grouping , It must be sorted before grouping , There will be temporary tables
Insert picture description here

Prevent index failure

Case study SQL

CREATE TABLE staffs (
  id INT PRIMARY KEY AUTO_INCREMENT,
  NAME VARCHAR (24)  NULL DEFAULT '' COMMENT ' full name ',
  age INT NOT NULL DEFAULT 0 COMMENT ' Age ',
  pos VARCHAR (20) NOT NULL DEFAULT '' COMMENT ' Position ',
  add_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ' Entry time '
) CHARSET utf8 COMMENT ' Employee record form ' ;

INSERT INTO staffs(NAME,age,pos,add_time) VALUES('z3',22,'manager',NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES('July',23,'dev',NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES('2000',23,'dev',NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES(null,23,'dev',NOW());

ALTER TABLE staffs ADD INDEX idx_staffs_nameAgePos(name, age, pos);

The leftmost prefix rule （ a key ）

If more than one column is indexed , Follow the leftmost prefix rule . The leftmost prefix rule refer to ** The query starts at the top left of the index and does not skip the columns in the index .**

Start with the first condition on the far left , If an index is used , Then you have to start with the leftmost field of the index , You can't lose , Don't omit , It's like a building , The third floor cannot be built without the first floor and the second floor .

The middle brother can't break , It is impossible to build the third floor directly after the first floor .

Indexes idx_staffs_nameAgePos When indexing , With name,age ,pos The order of the establishment of . A full value match indicates Queries that match in order .

therefore According to the leftmost prefix rule , It will reduce the probability of index failure .

EXPLAIN SELECT * FROM staffs WHERE NAME = 'July';

EXPLAIN SELECT * FROM staffs WHERE NAME = 'July' AND age = 25;

--  Full value matching my favorite 
EXPLAIN SELECT * FROM staffs WHERE NAME = 'July'AND age = 25 AND pos = 'dev';

Insert picture description here

Do not operate on index columns

Do nothing on the index column , such as Calculation 、 function 、 Automatically | Manual type conversion , because Doing so invalidates the index and leads to a full table scan .

--  Wrong operation application left Functions operate on index fields 
EXPLAIN SELECT * FROM staffs WHERE left(NAME,4) = 'July';

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-osdsSw1r-1644829484213)(2021-11-27- Index optimization .assets/image-20211128163623456.png)]

Storage engine problem

The storage engine cannot use the column to the right of the range condition in the index

Insert picture description here

Compare the two pictures , We found that , The second uses only two index fields , The index of the third field is invalid , All invalid after range condition field , So we have to optimize this range query as much as possible .

Try to use index overlay

Reduce use SELECT *, As the title .

MySQL Use is not equal to （ important ）

MySQL Use It's not equal to （!= | <>） You will not be able to use the index , Causes a full table scan , Attention is being taken **!= operation ** The index field of cannot be used , All indexes are invalid .

Insert picture description here

is not null and is null

The former cannot be indexed , The latter can be indexed .

Insert picture description here

To use less like keyword （ a key ）

**like Start with a wildcard （‘%xxxx’）** Such conditions ,MySQL The index will fail and become a full table scan .

like ‘abc%’type The type is range , It's the scope , You can use index .

Insert picture description here

But the inevitable use %xxx% What to do with the scene ？

We can use Index overlay To solve .

CREATE TABLE `tbl_user` (
 `id` INT(11) NOT NULL AUTO_INCREMENT,
 `NAME` VARCHAR(20) DEFAULT NULL,
 `age` INT(11) DEFAULT NULL,
 email VARCHAR(20) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

INSERT INTO tbl_user(NAME,age,email) VALUES('1aa1',21,'[email protected]');
INSERT INTO tbl_user(NAME,age,email) VALUES('2aa2',222,'[email protected]');
INSERT INTO tbl_user(NAME,age,email) VALUES('3aa3',265,'[email protected]');
INSERT INTO tbl_user(NAME,age,email) VALUES('4aa4',21,'[email protected]');
INSERT INTO tbl_user(NAME,age,email) VALUES('aa',121,'[email protected]');

-- before index
--  There is no index , So it doesn't matter how the following statements come from 
EXPLAIN SELECT NAME,age FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT id FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT NAME FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT age FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT id,NAME FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT id,NAME,age FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT NAME,age FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT * FROM tbl_user WHERE NAME LIKE '%aa%';
EXPLAIN SELECT id,NAME,age,email  FROM tbl_user WHERE NAME LIKE '%aa%';

-- create index  Create a composite index 
CREATE INDEX idx_user_nameAge ON tbl_user(NAME,age);

-- DROP INDEX idx_user_nameAge ON tbl_user

-- after index
EXPLAIN SELECT * FROM tbl_user WHERE NAME =800 AND age = 33;

To sum up, to solve this problem , We have to build a composite index , Then make a query within the index field , For example, composite index a,b,c Three fields , You can only use d Field , perhaps a,b,c,d Four fields （ One more field comes out ） No index other than is invalidated .

Type conversion

Type conversion will cause index invalidation , such as String type == No addition Single quotation marks == Meeting Automatic type Convert to another type .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-mnF26G3q-1644829484214)(2021-11-27- Index optimization .assets/image-20211128165209506.png)]

To use less or keyword

or Also need to use less , use or Keyword will also invalidate the index .

Insert picture description here

summary

Optimization summary chart （ important ）

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-vtElbent-1644829484215)(2021-11-27- Index optimization .assets/image-20211129161203575.png)]

Example summary diagram

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-g8UhtqxD-1644829484215)(2021-11-27- Index optimization .assets/image-20211129160957139.png)]

General advice

For single key indexes , Try to choose the current query Better filtering index .
In the choice Composite index When , At present query The most filterable field in the index field order , The higher the position, the better .
When choosing a composite index , Try to include the current query Medium where An index of more fields .
As much as possible by analyzing statistics and adjusting query To achieve the purpose of selecting the appropriate index .

Summary of association query optimization

Make sure that the driven meter join The field has been indexed （ Was the driver table ：join Table after is driven table ( Need to be queried )）.
MySQL Automatically select the small table as the driving table . Because the driver table will be scanned by the whole table anyway , So the fewer scans, the better
left join when , Choose a small watch as the driving watch , Big watch as driven watch ,right join conversely （ however left join It must be the driving table on the left , On the right is the driven table ,right join conversely ）.
inner join when ,mysql Will help you choose the table of small result set as the driving table .
Subqueries should not be placed in the driven table , It's possible that indexes are not used .
Reduce... As much as possible join The number of times .
Always remember to use Small tables drive large tables .
Optimize statements in nested loops first .
There is no guarantee that the driven table join The condition field is called Cited and On the premise of sufficient memory resources , Don't be too stingy JoinBuffer Set up .

EXISTS And IN Subquery optimization summary

With an index ：
- use inner join It's the best The second is in ,exists The worst .
- inner join > in > exists.

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-nSaOc5WE-1644829484215)(2021-11-27- Index optimization .assets/image-20211129182820101.png)]

No index
- Small tables drive large tables
  - because join The way needs to be distinct , No index distinct High consumption performance therefore exists > in > join.
- Big tables drive small ones
  - in and exists The performance of the should be close to that of the are relatively poor exists A little better More than 5% , however inner join Better than using join buffer So much faster .
  - inner join > exists > in
- If it is left join Is the slowest .

The content of this article is based on Shang Silicon Valley MySQL Advanced tutorials .
Code cloud warehouse synchronization notes , You can take it yourself. Welcome star correct :https://gitee.com/noblegasesgoo/notes

 If something goes wrong, I hope the leaders in the comment area can discuss and correct each other , Maintain the health of the community. Let's work together , There is no tolerance for wrong knowledge .
										——————————————————————  Love you  noblegasesgoo

原网站

版权声明
本文为[Noblegasesgoo]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160452159312.html

当前位置：网站首页>MySQL advanced - index optimization (super detailed)

MySQL advanced - index optimization (super detailed)

Performance analysis

MySQL Query Optimizer

MySQL Common bottlenecks

Explain（ a key ）

What is it?

What can I do?

How do you play?

Analysis of each field

id Field

Situation 1 ：id identical

Situation two ：id Different

Situation three ：id Same and different , At the same time

select_type

What are they?

table

type（ It's very important ）

What are they?

possible_keys

key

key_len

ref

rows

Extra（ important ）

Example

Index optimization

Optimization analysis

Single table optimization

Build table SQL

Case realization SQL

Optimization starts

Association query optimization

Build table SQL

Case a ： Double table

Case realization SQL

Optimization starts

Case 2 ： Three watches

Case realization SQL

Optimization starts

MySQL Of FILESORT Sort （ a key ）

One way sorting

What is it? ？

Two way sorting

What is it? ？

Compare

Conclusion and question

ORDER BY Optimize

case analysis

Build table SQL

Situation 1

Situation two

Case summary

GROUP BY Optimize

Prevent index failure

The leftmost prefix rule （ a key ）

Do not operate on index columns

Storage engine problem

Try to use index overlay

MySQL Use is not equal to （ important ）

is not null and is null

To use less like keyword （ a key ）

Type conversion

To use less or keyword

summary

Optimization summary chart （ important ）

Example summary diagram

General advice

Summary of association query optimization

EXISTS And IN Subquery optimization summary

边栏推荐

猜你喜欢

随机推荐