当前位置:网站首页>Common skills and understanding of SQL optimization
Common skills and understanding of SQL optimization
2022-07-07 05:33:00 【Dying fish】
Small tables drive large tables
Why small tables drive large tables can be seen in this article
sql Optimized query optimizer
in and exsits
The principle is that small tables drive large tables
Widely spread words : in and exists The connection mode of the drive table is different
hypothesis A The watch is the left watch ,B A table is a table of subqueries . When A A watch is a big watch , B When the table is a small table , Use in.
select * from A where id in (select id from B)
When A A watch is a small watch , B When a watch is big , Use exsits.(exists The subquery after is driven )
– exists(subquery) Only return true or false, The official also said that the query column will be ignored during actual implementation . therefore ,select * and select 1 No difference .
– exists The actual execution process of subquery is optimized , It's not an itemized match as we understood before .
select * from A where exists (select 1 from B where B.id = A.id)
not in and not exists: If the query statement uses not in, Then the internal and external tables are scanned , No index is used ; and not extsts The subquery of can still use the index on the table . So no matter which watch is big , use not exists All ratio not in Be quick .
So is this the case
For example, we have two tables erp_travel and erp_travel_cost
explain select * from erp_travel where EXISTS (select travel_no from erp_travel_cost where bearer = erp_travel.user_no and
erp_travel_cost.travel_no = erp_travel.travel_no)
and erp_travel.user_no = '00010413';
Conditions are added to the external query erp_travel.user_no = ‘00010413’;, So relatively speaking Subqueries are big tables , External query is a small table
Then rewrite it into in Inquire about
explain select * from erp_travel where travel_no in (select travel_no from erp_travel_cost where erp_travel_cost.bearer = erp_travel.user_no)
and erp_travel.user_no = '00010413';
According to the above , Directly change to in,in The next subquery is a large table , It should be very slow . But the actual implementation speed is still very fast .
see explain result
You can see it ,in Query by sql Optimization becomes inner join query , And it is automatically converted into small table driven large table connection , So this efficiency is still very high , Thanks to the join The optimization of the , Even better than exists More efficient .
Look at another situation
explain select * from erp_travel where erp_travel.project_no_form in (select max(erp_travel_cost.project_no) from erp_travel_cost where erp_travel_cost.travel_no = erp_travel.travel_no
GROUP BY erp_travel_cost.bearer,erp_travel_cost.project_no
) and erp_travel.user_no = '00010413';
show WARNINGS;
Looks like in Followed by a large table , But the efficiency is not low , So print out sql Optimizer optimized sql
/* select#1 */ select * from `test_bai`.`erp_travel` where <in_optimizer>(`test_bai`.`erp_travel`.`project_no_form`,<exists>(/* select#2 */ select 1 from `test_bai`.`erp_travel_cost` where ((`test_bai`.`erp_travel_cost`.`travel_no` = `test_bai`.`erp_travel`.`travel_no`) and (`test_bai`.`erp_travel_cost`.`creater` <> '00010413')) group by `test_bai`.`erp_travel_cost`.`bearer`,`test_bai`.`erp_travel_cost`.`project_no` having (<cache>(`test_bai`.`erp_travel`.`project_no_form`) = <ref_null_helper>(max(`test_bai`.`erp_travel_cost`.`project_no`)))))
Find out in It's optimized to exists,
After my test in mysql5.7 Next .mysql Yes in The optimization of has been very good , In appropriate cases, convert to exists And internal connection prompt efficiency
therefore ,in Watch , exists Big watch is also an inaccurate statement , Finally, it is necessary to analyze through the implementation plan , But as a standard, it's ok .
in other words , Even after use not in Subquery , If you are sql After the optimization , Still use the index , But this is the case ,not in Then it's not index Do not use microcosm
explain select * from erp_travel where user_no not in( '00022139','0010413');
count Query optimization
There are many materials on the Internet that say , want count(id) perhaps count(1), Don't count(*), Is this the case ? Let's practice today .
explain select count(id) from erp_travel; -- Through the index
explain select count(*) from erp_travel; -- Through the index
explain select count(1) from erp_travel; -- Through the index
explain select count(uuid) from erp_travel; -- Full table
It can be seen that except count Specify non indexed fields , The effect is the same
order by and group by Optimize
The optimization of sorting and grouping is actually very similar , The essence is to sort first and then group , Follow the leftmost matching principle of index creation order . therefore , Take sorting as an example .
Full field sorting ( Sort field does not use index )
- When to use full field sorting ?
- Fewer fields , The amount of data is small , Sorting can be done in memory ,Mysql Most of the sorting without index uses Complete sorting of all fields .
- Full field index sorting process - initialization sort_buffer, Make sure to put in name、city、age These three fields .
- From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
- To primary key id Index takes out the whole line , take name、city、age The values of the three fields , Deposit in sort_buffer in ;
- From the index city Take the primary key of the next record id;
- Repeat step 3、4 until city The value of does not meet the query conditions .
- Yes sort_buffer The data in is sorted by field name Do quick sort ;
- Sort the results by Take before 1000 That's ok Return to the client .
- Process details
- The whole sorting action , It could be done in memory , You may also need to use external sorting , It depends on the memory and parameters required for sorting sort_buffer_size.
- sort_buffer_size, Namely MySQL Memory opened up for sorting (sort_buffer) Size .
- If the amount of data to be sorted is less than sort_buffer_size, Sorting is done in memory .
- But if the sorting data is too large , There's no memory , You have to use temporary disk files to help sort . External sorting generally uses merge sorting algorithm .
rowid Sort ( Sort field does not use index )
- When to use rowid Sort ? - stay Full field sorting in , I only read the data of the original table once , The rest of the operation is in sort_buffer And temporary files .
- But there is a problem , If the query returns many fields ,sort_buffer Too many fields , In this way, the number of rows that can be put down at the same time in memory is very small , It has to be divided into many temporary files , Sorting performance will be poor .
- Mysql Think Full field sorting is too expensive , So using rowid Algorithmic sorting .
- rowid Sorting process
- initialization sort_buffer, Make sure to put in two fields , namely name and id.
- From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
- To primary key id Index takes out the whole line , take name、id These two fields , Deposit in sort_buffer in .
- From the index city Take the primary key of the next record id.
- Repeat step 3、4 Until not satisfied city=' Hangzhou ’ Until the conditions are met .
- Yes sort_buffer The data in is sorted by field name Sort .
- Traversal sort results , Take before 1000 That's ok , And in accordance with the ** id Return the value of to the original table **city、name and age Three fields are returned to the client .
- Process details
- contrast Full field sorting process you will find ,rowid Sort visited the table more than once Primary key index of .
Full field sorting contrast rowid Sort ? - If MySQL I'm really worried that the sorting memory is too small , Will affect the sorting efficiency , To adopt rowid Sorting algorithm , In this way, you can sort more rows at a time in the sorting process , But you need to go back to the original table to get the data .
- about InnoDB Table for example ,rowid Sorting will require more disk reads to go back to the table , So it won't be a priority .
Advantages of sorting field indexing - When the sorting field has an index , The query process does not require temporary tables , There is no need to sort .
- meanwhile , It will not scan all qualified rows , Instead, finding suitable conditions will return data .
Other things that need attention in sorting .
- If there is only order by create_time, Even if create_time There's an index on , It will not use . - Because the optimizer thinks that it costs more to go through the secondary index and then go back to the table than to scan and sort the whole table . So choose to go full meter scan , Then choose one of the two ways to sort according to the teacher
- Unconditional query, but order by create_time limit m. If m Less value , It can be indexed .
- Because the optimizer thinks that according to the index order, it will go back to the table to look up the data , Then get m Data , You can end the cycle , So it costs less than a full table scan , Then choose the secondary index .
- Even if there is no secondary index ,mysql in the light of order by limit Also optimized , Use heap sort .
Index overlay
explain select travel_no from erp_travel where travel_no not like '%sai%';
explain select user_no,user_name,creater from erp_travel where user_name not like '%sai%';
Index overrides query results , So it seems that the index is invalid , In fact, indexes are also used
Why is it recommended that the primary key be self incremented
If another primary key is inserted at this time, the value is 9 The record of , The insertion position is shown as follows :
But this data page is full , What if you plug in again ? We need to put the current Page splitting In two pages , Move some records in this page to the newly created page . What does page splitting and record shifting mean ? signify : Performance loss ! So if we want to try Avoid such unnecessary performance loss , It's best to let the inserted record The primary key values are incremented , In this way, there will be no such performance loss .
So we suggest : Let the primary key have AUTO_INCREMENT , Let the storage engine generate the primary key for the table itself ,
When inserting records, the storage engine will automatically fill in the self increasing primary key value for us . Such a primary key takes up less space , Write in sequence , Reduce page splits .
Index failure
- like With % start , Invalid index ; When like No prefix %, The suffix is % when , Index is valid .
The reason is also simple Indexed B+ The tree is sorted by the value of the index , And strings It is also sorted according to the prefix weight for example character string “12” Less than character string “2” Therefore If fuzzy query ,% You can still use the index if you are not ahead .
explain select * from erp_travel where travel_reason like ' test % merchant %'
- or The index is not used before and after the statement .
explain select * from erp_travel where travel_reason like ' test % merchant %' or user_no= '00022139'
If or There are indexes on both sides , Then they will go through the corresponding indexes , Then merge together ,type by index_merge
If one is not an index , Direct full table , There is no need to go through another index
explain select * from erp_travel where travel_reason like ' test % merchant %' or creater= '00022139'
- Composite index , Instead of using the first column index , Index failure .
Joint index user_no
, user_name
, creater
explain select * from erp_travel where creater = '00022139' and user_no= '00022139'
explain select * from erp_travel where creater = ‘00022139’ and user_no= ‘00022139’
so , For federated indexes The order of writing is not required to match the leftmost prefix , But the length of using the index only determines the length of using the joint index
- If the column type is string , Be sure to quote the data in the condition , Otherwise, the index is not used
explain select * from erp_travel where user_no= 00022139
- Use on index columns IS NULL or IS NOT NULL operation ( Not necessarily ineffective )
The index does not index null values , So you can't use an index for this operation , You can deal with it in other ways , for example : Numeric type , The judgment is greater than 0, Set a default value for the string type , Judge whether it is equal to the default value .( Here is the wrong statement !)
test
explain select * from erp_travel where user_no is null -- Go to the index
From this, it can be found that the index is used
summary : Use on index columns IS NULL or IS NOT NULL operation , The index does not necessarily fail !!!
- Use... On index fields not,<>,!=.
The not equal operator will never use an index , So processing it will only produce a full table scan . An optimization method : key<>0 Change it to key>0 or key<0( A stupid way , It's better to scan the whole table ).
explain select * from erp_travel where user_no > '00022139' -- Go to the index
explain select * from erp_travel where user_no > '00022139' or user_no < '00022139' -- Don't walk index
explain select * from erp_travel where user_no != '00022139' -- Don't walk index
- Calculate the index field 、 Use functions on fields .( The index for emp(ename,empno,sal))
explain select max(create_time) from erp_travel where user_no like CONCAT('000122','%') -- The function can go through the index without being on the index
explain select max(create_time) from erp_travel where left(user_no ,2) ='00'-- Don't walk index
- When the whole table scanning speed is faster than the index speed ,mysql Can use full scan , At this time, the index fails .
If mysql It is estimated that full table scanning is faster than indexing , Index is not used
- Paging without index
Paging query , A very common query in the system , It is suggested that after learning , Quickly check whether the paging function you are responsible for is indexed , Or whether the index is gone but it can be optimized . following , Let's take an example of some optimization methods .
select * from employees limit 10000, 10;
Direct inquiry , Don't walk index
explain select * from erp_travel order by travel_no limit 10000, 10; -- The secondary index does not go through the index
explain select * from erp_travel order by id limit 10000, 10; -- Primary key Go to the index ..
explain select * from erp_travel order by travel_no limit 10; -- Go to the index
sql Optimizer thinks The speed of secondary index returning to the table is not as fast as that of direct full table io Efficient , actually , Index takes up less memory , stay limit When the amount of data is large , Not only reduce io frequency , It also saves memory , so ,sql The optimizer is not necessarily right
Look at the paging optimization method on the Internet
explain select e.* from erp_travel e inner join (select id from erp_travel order by `travel_no` desc limit 10000, 10) t on t.id = e.id;
This idea is very interesting , Use the index itself to store id, A page can store a lot of data , Less io frequency ,limit And the data also saves a lot of memory , such limit It saves a lot of memory , promote limit The efficiency of , then , Take out the primary key , Then connect with the primary key erp_travel, Look up the leaf nodes of the cluster index in this way , Only checked 10 strip . Greatly reduced io frequency , When the amount of data is large, the effect is very obvious
I said before. ,**sql The optimizer is not necessarily right ,** Mandatory index specification can improve query efficiency
select * from app_user_copy1 force index(`key`) order by app_user_copy1.key desc limit 100000, 10;
How to index
A cliche , Interviews often ask , Here's a summary .
How to build an index , Personally, I think we should think from the following perspectives :
What scenarios need to be indexed
Which fields should be selected for indexing , The size of the field , Type of field
Number of indexes
What scenarios need to be indexed
High frequency query , And there are many data , Can filter more data through index
Table correlation
Statistics , Sort , Group aggregation
Which fields should be selected for indexing , The size of the field , Type of field
High frequency query , Update low frequency , And it can filter fields with more data
Associated fields for table Association
Used to sort , grouping , Statistics and other fields
The fields used for indexing should be as small as possible , Can reduce the height of the tree , See the following Alibaba specifications for specific rules
Number of indexes
The number of indexes should be as small as possible .
Because the index will take up space ;
When the record updates the database record , There is the cost of maintaining the index , The more the number of , The higher the maintenance cost ;
There are too many indexes in a table , When a condition finds that multiple indexes are valid , The optimizer will usually choose the index with the best performance to use , A large number , The cost of selecting the optimizer will also rise .
Try not to build indexes in fields with little filtered data , Such as : Gender .
where And order by When the conflict , priority where.
边栏推荐
- The navigation bar changes colors according to the route
- Summary of the mean value theorem of higher numbers
- 5. 数据访问 - EntityFramework集成
- Codeforces Round #416 (Div. 2) D. Vladik and Favorite Game
- Batch size setting skills
- When deleting a file, the prompt "the length of the source file name is greater than the length supported by the system" cannot be deleted. Solution
- Taobao commodity details page API interface, Taobao commodity list API interface, Taobao commodity sales API interface, Taobao app details API interface, Taobao details API interface
- 项目经理如何凭借NPDP证书逆袭?看这里
- Photo selector collectionview
- 淘宝商品详情页API接口、淘宝商品列表API接口,淘宝商品销量API接口,淘宝APP详情API接口,淘宝详情API接口
猜你喜欢
Two person game based on bevy game engine and FPGA
人体传感器好不好用?怎么用?Aqara绿米、小米之间到底买哪个
高级程序员必知必会,一文详解MySQL主从同步原理,推荐收藏
Cve-2021-3156 vulnerability recurrence notes
CVE-2021-3156 漏洞复现笔记
MySQL数据库学习(8) -- mysql 内容补充
K6el-100 leakage relay
EGR-20USCM接地故障继电器
不同网段之间实现GDB远程调试功能
Safe landing practice of software supply chain under salesforce containerized ISV scenario
随机推荐
基于NCF的多模块协同实例
DOM node object + time node comprehensive case
NPDP产品经理认证,到底是何方神圣?
JD commodity details page API interface, JD commodity sales API interface, JD commodity list API interface, JD app details API interface, JD details API interface, JD SKU information interface
Timer create timer
Educational Codeforces Round 22 B. The Golden Age
Is the human body sensor easy to use? How to use it? Which do you buy between aqara green rice and Xiaomi
Mysql database learning (8) -- MySQL content supplement
MySQL数据库学习(8) -- mysql 内容补充
[question] Compilation Principle
Pinduoduo product details interface, pinduoduo product basic information, pinduoduo product attribute interface
Leakage relay jelr-250fg
[论文阅读] Semi-supervised Left Atrium Segmentation with Mutual Consistency Training
JVM (XX) -- performance monitoring and tuning (I) -- Overview
Jhok-zbl1 leakage relay
Leetcode: maximum number of "balloons"
TabLayout修改自定义的Tab标题不生效问题
Dbsync adds support for mongodb and ES
DOM-节点对象+时间节点 综合案例
[JS component] custom select