当前位置:网站首页>Common skills and understanding of SQL optimization
Common skills and understanding of SQL optimization
2022-07-07 05:33:00 【Dying fish】
Small tables drive large tables
Why small tables drive large tables can be seen in this article
sql Optimized query optimizer
in and exsits
The principle is that small tables drive large tables
Widely spread words : in and exists The connection mode of the drive table is different
hypothesis A The watch is the left watch ,B A table is a table of subqueries . When A A watch is a big watch , B When the table is a small table , Use in.
select * from A where id in (select id from B)
When A A watch is a small watch , B When a watch is big , Use exsits.(exists The subquery after is driven )
– exists(subquery) Only return true or false, The official also said that the query column will be ignored during actual implementation . therefore ,select * and select 1 No difference .
– exists The actual execution process of subquery is optimized , It's not an itemized match as we understood before .
select * from A where exists (select 1 from B where B.id = A.id)
not in and not exists: If the query statement uses not in, Then the internal and external tables are scanned , No index is used ; and not extsts The subquery of can still use the index on the table . So no matter which watch is big , use not exists All ratio not in Be quick .
So is this the case
For example, we have two tables erp_travel and erp_travel_cost
explain select * from erp_travel where EXISTS (select travel_no from erp_travel_cost where bearer = erp_travel.user_no and
erp_travel_cost.travel_no = erp_travel.travel_no)
and erp_travel.user_no = '00010413';
Conditions are added to the external query erp_travel.user_no = ‘00010413’;, So relatively speaking Subqueries are big tables , External query is a small table
Then rewrite it into in Inquire about
explain select * from erp_travel where travel_no in (select travel_no from erp_travel_cost where erp_travel_cost.bearer = erp_travel.user_no)
and erp_travel.user_no = '00010413';
According to the above , Directly change to in,in The next subquery is a large table , It should be very slow . But the actual implementation speed is still very fast .
see explain result
You can see it ,in Query by sql Optimization becomes inner join query , And it is automatically converted into small table driven large table connection , So this efficiency is still very high , Thanks to the join The optimization of the , Even better than exists More efficient .
Look at another situation
explain select * from erp_travel where erp_travel.project_no_form in (select max(erp_travel_cost.project_no) from erp_travel_cost where erp_travel_cost.travel_no = erp_travel.travel_no
GROUP BY erp_travel_cost.bearer,erp_travel_cost.project_no
) and erp_travel.user_no = '00010413';
show WARNINGS;
Looks like in Followed by a large table , But the efficiency is not low , So print out sql Optimizer optimized sql
/* select#1 */ select * from `test_bai`.`erp_travel` where <in_optimizer>(`test_bai`.`erp_travel`.`project_no_form`,<exists>(/* select#2 */ select 1 from `test_bai`.`erp_travel_cost` where ((`test_bai`.`erp_travel_cost`.`travel_no` = `test_bai`.`erp_travel`.`travel_no`) and (`test_bai`.`erp_travel_cost`.`creater` <> '00010413')) group by `test_bai`.`erp_travel_cost`.`bearer`,`test_bai`.`erp_travel_cost`.`project_no` having (<cache>(`test_bai`.`erp_travel`.`project_no_form`) = <ref_null_helper>(max(`test_bai`.`erp_travel_cost`.`project_no`)))))
Find out in It's optimized to exists,
After my test in mysql5.7 Next .mysql Yes in The optimization of has been very good , In appropriate cases, convert to exists And internal connection prompt efficiency
therefore ,in Watch , exists Big watch is also an inaccurate statement , Finally, it is necessary to analyze through the implementation plan , But as a standard, it's ok .
in other words , Even after use not in Subquery , If you are sql After the optimization , Still use the index , But this is the case ,not in Then it's not index Do not use microcosm
explain select * from erp_travel where user_no not in( '00022139','0010413');
count Query optimization
There are many materials on the Internet that say , want count(id) perhaps count(1), Don't count(*), Is this the case ? Let's practice today .
explain select count(id) from erp_travel; -- Through the index
explain select count(*) from erp_travel; -- Through the index
explain select count(1) from erp_travel; -- Through the index
explain select count(uuid) from erp_travel; -- Full table
It can be seen that except count Specify non indexed fields , The effect is the same
order by and group by Optimize
The optimization of sorting and grouping is actually very similar , The essence is to sort first and then group , Follow the leftmost matching principle of index creation order . therefore , Take sorting as an example .
Full field sorting ( Sort field does not use index )
- When to use full field sorting ?
- Fewer fields , The amount of data is small , Sorting can be done in memory ,Mysql Most of the sorting without index uses Complete sorting of all fields .
- Full field index sorting process - initialization sort_buffer, Make sure to put in name、city、age These three fields .
- From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
- To primary key id Index takes out the whole line , take name、city、age The values of the three fields , Deposit in sort_buffer in ;
- From the index city Take the primary key of the next record id;
- Repeat step 3、4 until city The value of does not meet the query conditions .
- Yes sort_buffer The data in is sorted by field name Do quick sort ;
- Sort the results by Take before 1000 That's ok Return to the client .
- Process details
- The whole sorting action , It could be done in memory , You may also need to use external sorting , It depends on the memory and parameters required for sorting sort_buffer_size.
- sort_buffer_size, Namely MySQL Memory opened up for sorting (sort_buffer) Size .
- If the amount of data to be sorted is less than sort_buffer_size, Sorting is done in memory .
- But if the sorting data is too large , There's no memory , You have to use temporary disk files to help sort . External sorting generally uses merge sorting algorithm .
rowid Sort ( Sort field does not use index )
- When to use rowid Sort ? - stay Full field sorting in , I only read the data of the original table once , The rest of the operation is in sort_buffer And temporary files .
- But there is a problem , If the query returns many fields ,sort_buffer Too many fields , In this way, the number of rows that can be put down at the same time in memory is very small , It has to be divided into many temporary files , Sorting performance will be poor .
- Mysql Think Full field sorting is too expensive , So using rowid Algorithmic sorting .
- rowid Sorting process
- initialization sort_buffer, Make sure to put in two fields , namely name and id.
- From the index city Find the first satisfaction city=' Hangzhou ’ The primary key of the condition id.
- To primary key id Index takes out the whole line , take name、id These two fields , Deposit in sort_buffer in .
- From the index city Take the primary key of the next record id.
- Repeat step 3、4 Until not satisfied city=' Hangzhou ’ Until the conditions are met .
- Yes sort_buffer The data in is sorted by field name Sort .
- Traversal sort results , Take before 1000 That's ok , And in accordance with the ** id Return the value of to the original table **city、name and age Three fields are returned to the client .
- Process details
- contrast Full field sorting process you will find ,rowid Sort visited the table more than once Primary key index of .
Full field sorting contrast rowid Sort ? - If MySQL I'm really worried that the sorting memory is too small , Will affect the sorting efficiency , To adopt rowid Sorting algorithm , In this way, you can sort more rows at a time in the sorting process , But you need to go back to the original table to get the data .
- about InnoDB Table for example ,rowid Sorting will require more disk reads to go back to the table , So it won't be a priority .
Advantages of sorting field indexing - When the sorting field has an index , The query process does not require temporary tables , There is no need to sort .
- meanwhile , It will not scan all qualified rows , Instead, finding suitable conditions will return data .
Other things that need attention in sorting .
- If there is only order by create_time, Even if create_time There's an index on , It will not use . - Because the optimizer thinks that it costs more to go through the secondary index and then go back to the table than to scan and sort the whole table . So choose to go full meter scan , Then choose one of the two ways to sort according to the teacher
- Unconditional query, but order by create_time limit m. If m Less value , It can be indexed .
- Because the optimizer thinks that according to the index order, it will go back to the table to look up the data , Then get m Data , You can end the cycle , So it costs less than a full table scan , Then choose the secondary index .
- Even if there is no secondary index ,mysql in the light of order by limit Also optimized , Use heap sort .
Index overlay
explain select travel_no from erp_travel where travel_no not like '%sai%';
explain select user_no,user_name,creater from erp_travel where user_name not like '%sai%';
Index overrides query results , So it seems that the index is invalid , In fact, indexes are also used
Why is it recommended that the primary key be self incremented
If another primary key is inserted at this time, the value is 9 The record of , The insertion position is shown as follows :
But this data page is full , What if you plug in again ? We need to put the current Page splitting In two pages , Move some records in this page to the newly created page . What does page splitting and record shifting mean ? signify : Performance loss ! So if we want to try Avoid such unnecessary performance loss , It's best to let the inserted record The primary key values are incremented , In this way, there will be no such performance loss .
So we suggest : Let the primary key have AUTO_INCREMENT , Let the storage engine generate the primary key for the table itself ,
When inserting records, the storage engine will automatically fill in the self increasing primary key value for us . Such a primary key takes up less space , Write in sequence , Reduce page splits .
Index failure
- like With % start , Invalid index ; When like No prefix %, The suffix is % when , Index is valid .
The reason is also simple Indexed B+ The tree is sorted by the value of the index , And strings It is also sorted according to the prefix weight for example character string “12” Less than character string “2” Therefore If fuzzy query ,% You can still use the index if you are not ahead .
explain select * from erp_travel where travel_reason like ' test % merchant %'
- or The index is not used before and after the statement .
explain select * from erp_travel where travel_reason like ' test % merchant %' or user_no= '00022139'
If or There are indexes on both sides , Then they will go through the corresponding indexes , Then merge together ,type by index_merge
If one is not an index , Direct full table , There is no need to go through another index
explain select * from erp_travel where travel_reason like ' test % merchant %' or creater= '00022139'
- Composite index , Instead of using the first column index , Index failure .
Joint index user_no
, user_name
, creater
explain select * from erp_travel where creater = '00022139' and user_no= '00022139'
explain select * from erp_travel where creater = ‘00022139’ and user_no= ‘00022139’
so , For federated indexes The order of writing is not required to match the leftmost prefix , But the length of using the index only determines the length of using the joint index
- If the column type is string , Be sure to quote the data in the condition , Otherwise, the index is not used
explain select * from erp_travel where user_no= 00022139
- Use on index columns IS NULL or IS NOT NULL operation ( Not necessarily ineffective )
The index does not index null values , So you can't use an index for this operation , You can deal with it in other ways , for example : Numeric type , The judgment is greater than 0, Set a default value for the string type , Judge whether it is equal to the default value .( Here is the wrong statement !)
test
explain select * from erp_travel where user_no is null -- Go to the index
From this, it can be found that the index is used
summary : Use on index columns IS NULL or IS NOT NULL operation , The index does not necessarily fail !!!
- Use... On index fields not,<>,!=.
The not equal operator will never use an index , So processing it will only produce a full table scan . An optimization method : key<>0 Change it to key>0 or key<0( A stupid way , It's better to scan the whole table ).
explain select * from erp_travel where user_no > '00022139' -- Go to the index
explain select * from erp_travel where user_no > '00022139' or user_no < '00022139' -- Don't walk index
explain select * from erp_travel where user_no != '00022139' -- Don't walk index
- Calculate the index field 、 Use functions on fields .( The index for emp(ename,empno,sal))
explain select max(create_time) from erp_travel where user_no like CONCAT('000122','%') -- The function can go through the index without being on the index
explain select max(create_time) from erp_travel where left(user_no ,2) ='00'-- Don't walk index
- When the whole table scanning speed is faster than the index speed ,mysql Can use full scan , At this time, the index fails .
If mysql It is estimated that full table scanning is faster than indexing , Index is not used
- Paging without index
Paging query , A very common query in the system , It is suggested that after learning , Quickly check whether the paging function you are responsible for is indexed , Or whether the index is gone but it can be optimized . following , Let's take an example of some optimization methods .
select * from employees limit 10000, 10;
Direct inquiry , Don't walk index
explain select * from erp_travel order by travel_no limit 10000, 10; -- The secondary index does not go through the index
explain select * from erp_travel order by id limit 10000, 10; -- Primary key Go to the index ..
explain select * from erp_travel order by travel_no limit 10; -- Go to the index
sql Optimizer thinks The speed of secondary index returning to the table is not as fast as that of direct full table io Efficient , actually , Index takes up less memory , stay limit When the amount of data is large , Not only reduce io frequency , It also saves memory , so ,sql The optimizer is not necessarily right
Look at the paging optimization method on the Internet
explain select e.* from erp_travel e inner join (select id from erp_travel order by `travel_no` desc limit 10000, 10) t on t.id = e.id;
This idea is very interesting , Use the index itself to store id, A page can store a lot of data , Less io frequency ,limit And the data also saves a lot of memory , such limit It saves a lot of memory , promote limit The efficiency of , then , Take out the primary key , Then connect with the primary key erp_travel, Look up the leaf nodes of the cluster index in this way , Only checked 10 strip . Greatly reduced io frequency , When the amount of data is large, the effect is very obvious
I said before. ,**sql The optimizer is not necessarily right ,** Mandatory index specification can improve query efficiency
select * from app_user_copy1 force index(`key`) order by app_user_copy1.key desc limit 100000, 10;
How to index
A cliche , Interviews often ask , Here's a summary .
How to build an index , Personally, I think we should think from the following perspectives :
What scenarios need to be indexed
Which fields should be selected for indexing , The size of the field , Type of field
Number of indexes
What scenarios need to be indexed
High frequency query , And there are many data , Can filter more data through index
Table correlation
Statistics , Sort , Group aggregation
Which fields should be selected for indexing , The size of the field , Type of field
High frequency query , Update low frequency , And it can filter fields with more data
Associated fields for table Association
Used to sort , grouping , Statistics and other fields
The fields used for indexing should be as small as possible , Can reduce the height of the tree , See the following Alibaba specifications for specific rules
Number of indexes
The number of indexes should be as small as possible .
Because the index will take up space ;
When the record updates the database record , There is the cost of maintaining the index , The more the number of , The higher the maintenance cost ;
There are too many indexes in a table , When a condition finds that multiple indexes are valid , The optimizer will usually choose the index with the best performance to use , A large number , The cost of selecting the optimizer will also rise .
Try not to build indexes in fields with little filtered data , Such as : Gender .
where And order by When the conflict , priority where.
边栏推荐
- Use, configuration and points for attention of network layer protocol (taking QoS as an example) when using OPNET for network simulation
- English语法_名词 - 所有格
- How does mapbox switch markup languages?
- Mysql database learning (7) -- a brief introduction to pymysql
- Disk monitoring related commands
- [opencv] image morphological operation opencv marks the positions of different connected domains
- NPDP产品经理认证,到底是何方神圣?
- Design, configuration and points for attention of network specified source multicast (SSM) simulation using OPNET
- 漏电继电器JELR-250FG
- Annotation初体验
猜你喜欢
JHOK-ZBG2漏电继电器
[论文阅读] A Multi-branch Hybrid Transformer Network for Corneal Endothelial Cell Segmentation
JHOK-ZBL1漏电继电器
Leetcode 1189 maximum number of "balloons" [map] the leetcode road of heroding
阿里云的神龙架构是怎么工作的 | 科普图解
K6EL-100漏电继电器
JD commodity details page API interface, JD commodity sales API interface, JD commodity list API interface, JD app details API interface, JD details API interface, JD SKU information interface
Complete code of C language neural network and its meaning
Leakage relay jelr-250fg
LabVIEW is opening a new reference, indicating that the memory is full
随机推荐
痛心啊 收到教训了
张平安:加快云上数字创新,共建产业智慧生态
The year of the tiger is coming. Come and make a wish. I heard that the wish will come true
实现网页内容可编辑
Cve-2021-3156 vulnerability recurrence notes
NPDP产品经理认证,到底是何方神圣?
Linkedblockingqueue source code analysis - initialization
5阶多项式轨迹
在米家、欧瑞博、苹果HomeKit趋势下,智汀如何从中脱颖而出?
Pinduoduo product details interface, pinduoduo product basic information, pinduoduo product attribute interface
[PM products] what is cognitive load? How to adjust cognitive load reasonably?
Leakage relay llj-100fs
做自媒体,有哪些免费下载视频剪辑素材的网站?
[Oracle] simple date and time formatting and sorting problem
Safe landing practice of software supply chain under salesforce containerized ISV scenario
Tablayout modification of customized tab title does not take effect
阿里云的神龙架构是怎么工作的 | 科普图解
【js组件】自定义select
Digital innovation driven guide
漏电继电器JD1-100