当前位置：网站首页>Common skills and understanding of SQL optimization

Common skills and understanding of SQL optimization

2022-07-07 05:33:00 【Dying fish】

Small tables drive large tables

Why small tables drive large tables can be seen in this article
sql Optimized query optimizer

in and exsits

The principle is that small tables drive large tables
Widely spread words ： in and exists The connection mode of the drive table is different

hypothesis A The watch is the left watch ,B A table is a table of subqueries . When A A watch is a big watch , B When the table is a small table , Use in.

select * from A where id in (select id from B)
When A A watch is a small watch , B When a watch is big , Use exsits.（exists The subquery after is driven ）

– exists(subquery) Only return true or false, The official also said that the query column will be ignored during actual implementation . therefore ,select * and select 1 No difference .
– exists The actual execution process of subquery is optimized , It's not an itemized match as we understood before .
select * from A where exists (select 1 from B where B.id = A.id)

not in and not exists： If the query statement uses not in, Then the internal and external tables are scanned , No index is used ; and not extsts The subquery of can still use the index on the table . So no matter which watch is big , use not exists All ratio not in Be quick .

So is this the case
For example, we have two tables erp_travel and erp_travel_cost

explain select * from  erp_travel where   EXISTS (select travel_no  from erp_travel_cost where bearer  = erp_travel.user_no and 
erp_travel_cost.travel_no = erp_travel.travel_no)
and erp_travel.user_no = '00010413';

Insert picture description here

Conditions are added to the external query erp_travel.user_no = ‘00010413’;, So relatively speaking Subqueries are big tables , External query is a small table

Then rewrite it into in Inquire about

explain  select * from  erp_travel where   travel_no in (select travel_no  from erp_travel_cost where erp_travel_cost.bearer  = erp_travel.user_no)
and erp_travel.user_no = '00010413';

According to the above , Directly change to in,in The next subquery is a large table , It should be very slow . But the actual implementation speed is still very fast .
see explain result
Insert picture description here

You can see it ,in Query by sql Optimization becomes inner join query , And it is automatically converted into small table driven large table connection , So this efficiency is still very high , Thanks to the join The optimization of the , Even better than exists More efficient .
Look at another situation

explain  select * from  erp_travel where   erp_travel.project_no_form   in (select max(erp_travel_cost.project_no)  from erp_travel_cost where erp_travel_cost.travel_no  = erp_travel.travel_no 
GROUP BY erp_travel_cost.bearer,erp_travel_cost.project_no 
)  and erp_travel.user_no = '00010413';

show WARNINGS;

Insert picture description here

Looks like in Followed by a large table , But the efficiency is not low , So print out sql Optimizer optimized sql


/* select#1 */ select * from `test_bai`.`erp_travel` where <in_optimizer>(`test_bai`.`erp_travel`.`project_no_form`,<exists>(/* select#2 */ select 1 from `test_bai`.`erp_travel_cost` where ((`test_bai`.`erp_travel_cost`.`travel_no` = `test_bai`.`erp_travel`.`travel_no`) and (`test_bai`.`erp_travel_cost`.`creater` <> '00010413')) group by `test_bai`.`erp_travel_cost`.`bearer`,`test_bai`.`erp_travel_cost`.`project_no` having (<cache>(`test_bai`.`erp_travel`.`project_no_form`) = <ref_null_helper>(max(`test_bai`.`erp_travel_cost`.`project_no`)))))

Find out in It's optimized to exists,
After my test in mysql5.7 Next .mysql Yes in The optimization of has been very good , In appropriate cases, convert to exists And internal connection prompt efficiency
therefore ,in Watch , exists Big watch is also an inaccurate statement , Finally, it is necessary to analyze through the implementation plan , But as a standard, it's ok .
in other words , Even after use not in Subquery , If you are sql After the optimization , Still use the index , But this is the case ,not in Then it's not index Do not use microcosm

explain  select * from  erp_travel where  user_no not in( '00022139','0010413');

Insert picture description here

count Query optimization

There are many materials on the Internet that say , want count(id) perhaps count(1), Don't count(*), Is this the case ? Let's practice today .

explain  select count(id) from erp_travel; --  Through the index 

explain  select count(*) from erp_travel; --  Through the index 

explain select count(1) from erp_travel; --  Through the index 

explain  select count(uuid) from erp_travel;  --  Full table

It can be seen that except count Specify non indexed fields , The effect is the same

order by and group by Optimize

The optimization of sorting and grouping is actually very similar , The essence is to sort first and then group , Follow the leftmost matching principle of index creation order . therefore , Take sorting as an example .
Full field sorting （ Sort field does not use index ）
- When to use full field sorting ？

Fewer fields , The amount of data is small , Sorting can be done in memory ,Mysql Most of the sorting without index uses Complete sorting of all fields .
- Full field index sorting process
initialization sort_buffer, Make sure to put in name、city、age These three fields .
From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
To primary key id Index takes out the whole line , take name、city、age The values of the three fields , Deposit in sort_buffer in ;
From the index city Take the primary key of the next record id;
Repeat step 3、4 until city The value of does not meet the query conditions .
Yes sort_buffer The data in is sorted by field name Do quick sort ;
Sort the results by Take before 1000 That's ok Return to the client .

- Process details

The whole sorting action , It could be done in memory , You may also need to use external sorting , It depends on the memory and parameters required for sorting sort_buffer_size.
sort_buffer_size, Namely MySQL Memory opened up for sorting （sort_buffer） Size .
If the amount of data to be sorted is less than sort_buffer_size, Sorting is done in memory .
But if the sorting data is too large , There's no memory , You have to use temporary disk files to help sort . External sorting generally uses merge sorting algorithm .
rowid Sort （ Sort field does not use index ）
- When to use rowid Sort ？
stay Full field sorting in , I only read the data of the original table once , The rest of the operation is in sort_buffer And temporary files .
But there is a problem , If the query returns many fields ,sort_buffer Too many fields , In this way, the number of rows that can be put down at the same time in memory is very small , It has to be divided into many temporary files , Sorting performance will be poor .
Mysql Think Full field sorting is too expensive , So using rowid Algorithmic sorting .

- rowid Sorting process

initialization sort_buffer, Make sure to put in two fields , namely name and id.
From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
To primary key id Index takes out the whole line , take name、id These two fields , Deposit in sort_buffer in .
From the index city Take the primary key of the next record id.
Repeat step 3、4 Until not satisfied city=' Hangzhou ’ Until the conditions are met .
Yes sort_buffer The data in is sorted by field name Sort .
Traversal sort results , Take before 1000 That's ok , And in accordance with the ** id Return the value of to the original table **city、name and age Three fields are returned to the client .

- Process details

contrast Full field sorting process you will find ,rowid Sort visited the table more than once Primary key index of .
Full field sorting contrast rowid Sort ？
If MySQL I'm really worried that the sorting memory is too small , Will affect the sorting efficiency , To adopt rowid Sorting algorithm , In this way, you can sort more rows at a time in the sorting process , But you need to go back to the original table to get the data .
about InnoDB Table for example ,rowid Sorting will require more disk reads to go back to the table , So it won't be a priority .
Advantages of sorting field indexing
When the sorting field has an index , The query process does not require temporary tables , There is no need to sort .
meanwhile , It will not scan all qualified rows , Instead, finding suitable conditions will return data .
Other things that need attention in sorting .
- If there is only order by create_time, Even if create_time There's an index on , It will not use .
Because the optimizer thinks that it costs more to go through the secondary index and then go back to the table than to scan and sort the whole table . So choose to go full meter scan , Then choose one of the two ways to sort according to the teacher

- Unconditional query, but order by create_time limit m. If m Less value , It can be indexed .

Because the optimizer thinks that according to the index order, it will go back to the table to look up the data , Then get m Data , You can end the cycle , So it costs less than a full table scan , Then choose the secondary index .
Even if there is no secondary index ,mysql in the light of order by limit Also optimized , Use heap sort .

Index overlay


explain select  travel_no from erp_travel where travel_no not like '%sai%';



explain select user_no,user_name,creater from  erp_travel where user_name not like '%sai%';

Index overrides query results , So it seems that the index is invalid , In fact, indexes are also used

Why is it recommended that the primary key be self incremented

Insert picture description here

If another primary key is inserted at this time, the value is 9 The record of , The insertion position is shown as follows ：
Insert picture description here
But this data page is full , What if you plug in again ？ We need to put the current Page splitting In two pages , Move some records in this page to the newly created page . What does page splitting and record shifting mean ？ signify ： Performance loss ！ So if we want to try Avoid such unnecessary performance loss , It's best to let the inserted record The primary key values are incremented , In this way, there will be no such performance loss .
So we suggest ： Let the primary key have AUTO_INCREMENT , Let the storage engine generate the primary key for the table itself ,
When inserting records, the storage engine will automatically fill in the self increasing primary key value for us . Such a primary key takes up less space , Write in sequence , Reduce page splits .

Index failure

like With % start , Invalid index ; When like No prefix %, The suffix is % when , Index is valid .

The reason is also simple Indexed B+ The tree is sorted by the value of the index , And strings It is also sorted according to the prefix weight for example character string “12” Less than character string “2” Therefore If fuzzy query ,% You can still use the index if you are not ahead .

explain  select * from  erp_travel where travel_reason like ' test % merchant %'

Insert picture description here

or The index is not used before and after the statement .


explain  select * from  erp_travel where travel_reason like ' test % merchant %' or user_no= '00022139'

Insert picture description here

If or There are indexes on both sides , Then they will go through the corresponding indexes , Then merge together ,type by index_merge
If one is not an index , Direct full table , There is no need to go through another index

explain  select * from  erp_travel where travel_reason like ' test % merchant %' or creater= '00022139'

Insert picture description here

Composite index , Instead of using the first column index , Index failure .

Joint index user_no, user_name, creater

explain  select * from  erp_travel where creater = '00022139' and user_no= '00022139'

explain select * from erp_travel where creater = ‘00022139’ and user_no= ‘00022139’
so , For federated indexes The order of writing is not required to match the leftmost prefix , But the length of using the index only determines the length of using the joint index

If the column type is string , Be sure to quote the data in the condition , Otherwise, the index is not used

explain  select * from  erp_travel where  user_no= 00022139

Insert picture description here

Use on index columns IS NULL or IS NOT NULL operation （ Not necessarily ineffective ）

The index does not index null values , So you can't use an index for this operation , You can deal with it in other ways , for example ： Numeric type , The judgment is greater than 0, Set a default value for the string type , Judge whether it is equal to the default value .（ Here is the wrong statement ！）
test

explain  select * from  erp_travel where  user_no is null  --  Go to the index

From this, it can be found that the index is used
summary ： Use on index columns IS NULL or IS NOT NULL operation , The index does not necessarily fail ！！！

Use... On index fields not,<>,!=.

The not equal operator will never use an index , So processing it will only produce a full table scan . An optimization method ： key<>0 Change it to key>0 or key<0（ A stupid way , It's better to scan the whole table ）.


explain  select * from  erp_travel where  user_no > '00022139' --  Go to the index 

explain  select * from  erp_travel where  user_no > '00022139' or  user_no < '00022139' --  Don't walk index 

explain  select * from  erp_travel where  user_no != '00022139' --  Don't walk index

Calculate the index field 、 Use functions on fields .（ The index for emp(ename,empno,sal)）

explain  select max(create_time) from  erp_travel where  user_no like  CONCAT('000122','%') -- The function can go through the index without being on the index 
explain  select max(create_time) from  erp_travel where  left(user_no ,2) ='00'-- Don't walk index

When the whole table scanning speed is faster than the index speed ,mysql Can use full scan , At this time, the index fails .

If mysql It is estimated that full table scanning is faster than indexing , Index is not used

Paging without index

Paging query , A very common query in the system , It is suggested that after learning , Quickly check whether the paging function you are responsible for is indexed , Or whether the index is gone but it can be optimized . following , Let's take an example of some optimization methods .

select * from employees limit 10000, 10;

Direct inquiry , Don't walk index

explain select * from erp_travel order by travel_no limit 10000, 10; -- The secondary index does not go through the index 
explain select * from erp_travel order by id limit 10000, 10; --  Primary key   Go to the index ..
explain select * from erp_travel order by travel_no limit 10; -- Go to the index

sql Optimizer thinks The speed of secondary index returning to the table is not as fast as that of direct full table io Efficient , actually , Index takes up less memory , stay limit When the amount of data is large , Not only reduce io frequency , It also saves memory , so ,sql The optimizer is not necessarily right
Look at the paging optimization method on the Internet

explain select e.* from erp_travel e inner join (select id from erp_travel order by `travel_no` desc limit 10000, 10) t on t.id = e.id;

This idea is very interesting , Use the index itself to store id, A page can store a lot of data , Less io frequency ,limit And the data also saves a lot of memory , such limit It saves a lot of memory , promote limit The efficiency of , then , Take out the primary key , Then connect with the primary key erp_travel, Look up the leaf nodes of the cluster index in this way , Only checked 10 strip . Greatly reduced io frequency , When the amount of data is large, the effect is very obvious

I said before. ,**sql The optimizer is not necessarily right ,** Mandatory index specification can improve query efficiency


select * from app_user_copy1 force index(`key`)  order by app_user_copy1.key desc limit 100000, 10;

How to index

A cliche , Interviews often ask , Here's a summary .

How to build an index , Personally, I think we should think from the following perspectives ：

What scenarios need to be indexed
Which fields should be selected for indexing , The size of the field , Type of field
Number of indexes

What scenarios need to be indexed

High frequency query , And there are many data , Can filter more data through index
Table correlation
Statistics , Sort , Group aggregation

Which fields should be selected for indexing , The size of the field , Type of field

High frequency query , Update low frequency , And it can filter fields with more data
Associated fields for table Association
Used to sort , grouping , Statistics and other fields
The fields used for indexing should be as small as possible , Can reduce the height of the tree , See the following Alibaba specifications for specific rules

Number of indexes

The number of indexes should be as small as possible .

Because the index will take up space ;
When the record updates the database record , There is the cost of maintaining the index , The more the number of , The higher the maintenance cost ;
There are too many indexes in a table , When a condition finds that multiple indexes are valid , The optimizer will usually choose the index with the best performance to use , A large number , The cost of selecting the optimizer will also rise .

Try not to build indexes in fields with little filtered data , Such as ： Gender .

where And order by When the conflict , priority where.

原网站

版权声明
本文为[Dying fish]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207062336014515.html