当前位置:网站首页>SQL optimization
SQL optimization
2022-06-11 04:16:00 【Prince you】
One 、 Preface
SQL Optimization in improving system performance is :( Lowest cost && The optimization effect is the most obvious ) Way . If your team is in SQL We have done a good job in this area , It's a qualitative leap forward in terms of usability for your entire large system , It really saves your boss more than a couple of bucks .
- Optimized cost : Hardware > The system configuration > Database table structure >SQL And index .
- The optimization effect : Hardware < The system configuration < Database table structure <SQL And index .
First , about MySQL Layer optimization generally follows five principles :
- Reduce data access : Set a reasonable field type , Enable compression , Reduce disks by index access, etc IO
- Return less data : Only return the required fields and data pagination processing Reduce disk io And the Internet io
- Reduce the number of interactions : Batch DML operation , Function storage to reduce the number of data connections
- Reduce servers CPU expenses : Minimize database sorting operations and full table queries , Reduce cpu Memory footprint
- Use more resources : Use table partitioning , Parallel operations can be added , Make greater use of cpu resources
To sum up SQL Optimization , Just three o'clock :
- Maximize the use of indexes ;
- Avoid full table scanning as much as possible ;
- Reduce invalid data queries ;
Two 、SELECT sentence - Grammatical order
1. SELECT
2. DISTINCT <select_list>
3. FROM <left_table>
4. <join_type> JOIN <right_table>
5. ON <join_condition>
6. WHERE <where_condition>
7. GROUP BY <group_by_list>
8. HAVING <having_condition>
9. ORDER BY <order_by_condition>
10.LIMIT <limit_number>3、 ... and 、SELECT sentence - Execution order
- FROM:< Table name > # Selection table , The data of multiple tables is transformed into one table through Cartesian product .
- ON:< filter > # Filter the virtual table of Cartesian product
- JOIN: <join, left join, right join...><join surface > # Appoint join, Used to add data to on In the virtual table after , for example left join The remaining data from the left table will be added to the virtual table
- WHERE:<where Conditions > # Filter the above virtual table
- GROUP BY:< Grouping conditions > # grouping <SUM() Wait for the aggregate function > # be used for having Clause to judge , In writing, this kind of aggregate function is written in having Judge what's inside
- HAVING:< Group screening > # Aggregate and filter the results after grouping
- SELECT:< Back to the data list > # The returned single column must be in group by clause , Except aggregate functions
- DISTINCT:# Data De duplication
- ORDER BY:< Sorting conditions > # Sort
- LIMIT:< Row limit >
Four 、SQL Optimization strategy
Statement : following SQL The optimization strategy is suitable for large amount of data , If the amount of data is small , There's no need for that to prevail , So as not to add insult to injury .
(1) Try to avoid fuzzy queries at the beginning of fields , Will cause the database engine to abandon the index for full table scanning . as follows :
SELECT * FROM t WHERE username LIKE '% Chen %'How to optimize : Try to use fuzzy queries after fields . as follows :
SELECT * FROM t WHERE username LIKE ' Chen %'If the requirement is to use fuzzy queries before ,
- Use MySQL Built in functions INSTR(str,substr) To match , Function like java Medium indexOf(), Query the corner mark position of the string
- Use FullText Full-text index , use match against retrieval
- When there is a large amount of data , It is suggested that ElasticSearch、solr, Hundred million data, retrieval speed second level
- When the amount of table data is small ( Thousands of them ), Don't make such a fuss , Direct use like '%xx%'.
(2) Avoid using in and not in, It will cause the engine to scan the whole table . as follows :
SELECT * FROM t WHERE id IN (2,3)How to optimize : If it's a continuous number , It can be used between Instead of . as follows :
SELECT * FROM t WHERE id BETWEEN 2 AND 3If it's a subquery , It can be used exists Instead of . as follows :
-- Don't walk index
select * from A where A.id in (select id from B);
-- Go to the index
select * from A where exists (select * from B where B.id = A.id);(3) Avoid using or, Will cause the database engine to abandon the index for full table scanning . as follows :
SELECT * FROM t WHERE id = 1 OR id = 3How to optimize : It can be used union Instead of or. as follows :
SELECT * FROM t WHERE id = 1
UNION
SELECT * FROM t WHERE id = 3(4) Try to avoid null Value judgment , Will cause the database engine to abandon the index for full table scanning . as follows :
SELECT * FROM t WHERE score IS NULLHow to optimize : You can add default values to fields 0, Yes 0 Value for judgment . as follows :
SELECT * FROM t WHERE score = 0(5) Try to avoid where The left side of the equal sign in the condition 、 Function operation , Will cause the database engine to abandon the index for full table scanning .
You can put the expression 、 The function operation moves to the right of the equal sign . as follows :
-- Full table scan
SELECT * FROM T WHERE score/10 = 9
-- Go to the index
SELECT * FROM T WHERE score = 10*9(6) When the amount of data is large , Avoid using where 1=1 Conditions .
Usually for the convenience of assembling query conditions , We will use this condition by default , The database engine will abandon the index for full table scanning . as follows :
SELECT username, age, sex FROM T WHERE 1=1How to optimize : Assemble with code sql Judge when , no where The conditions are removed where, Yes where On condition that and.
(7) Query conditions cannot be used <> perhaps !=
When using index columns as criteria for queries , Need to avoid using <> perhaps != And so on . If the business needs , Use not equal to sign , Need to re evaluate the index to build , Avoid indexing this field , It is replaced by other index fields in the query criteria .
(8)where Condition contains only compound index non leading columns
as follows : Reunite with ( union ) The index contains key_part1,key_part2,key_part3 The three column , but SQL The statement does not contain an index precolumn "key_part1", according to MySQL The leftmost matching principle of union index , It's not going to be a union index .
select col1 from table where key_part2=1 and key_part3=2(9) Implicit type conversion causes index not to be used
as follows SQL Statement because the index pair column type is varchar, But the value given is a number , Involving implicit type conversion , Cause the index can't go right .
select col1 from table where col_varchar=123;(10)order by The conditions should be the same as where The conditions are consistent , otherwise order by No sorting by index
-- Don't go age Indexes
SELECT * FROM t order by age;
-- go age Indexes
SELECT * FROM t where age > 0 order by age;
For the above statement , The processing order of the database is :
- First step : according to where Conditions and statistics generate execution plans , Get data .
- The second step : Sort the data you get . When execution processes data (order by) when , The database will first look at the implementation plan of the first step , see order by Whether the field of the execution plan uses the index . If it is , You can use the index order to get the ordered data directly . If not , Then the sorting operation is performed again .
- The third step : Return sorted data .
When order by The fields in appear in where When the condition is medium , Will use the index instead of the secondary sort , More precisely ,order by When the fields in the execution plan take advantage of the index , No sorting operations .
This conclusion is not only true of order by It works , It also works for other operations that need sorting . such as group by 、union 、distinct etc. .
(11) Use... Correctly hint( Tips ) Optimize statements
MySQL Can be used in hint Specifies that the optimizer selects or ignores specific indexes at execution time . generally speaking , Changes in table structure index due to version changes , It is better to avoid using hint, But through Analyze table Collect more statistics . But on certain occasions , Appoint hint You can exclude other index interference and specify a better execution plan .
- USE INDEX After the table name in your query statement , add to USE INDEX To provide hope MySQL To refer to the index list , You can make MySQL No longer consider other available indexes . Example : SELECT col1 FROM table USE INDEX (mod_time, name)...
- IGNORE INDEX If you just want to let MySQL Ignore one or more indexes , have access to IGNORE INDEX As Hint. Example : SELECT col1 FROM table IGNORE INDEX (priority) ...
- FORCE INDEX Is mandatory MySQL Use a specific index , It can be used in query FORCE INDEX As Hint. Example : SELECT col1 FROM table FORCE INDEX (mod_time) ...
At query time , The database system will analyze the query statements automatically , And choose the most appropriate index . But a lot of times , The query optimizer of a database system may not always be able to use the optimal index . If we know how to choose an index , have access to FORCE INDEX Force the query to use the specified index .
for example :
SELECT * FROM students FORCE INDEX (idx_class_id) WHERE class_id = 1 ORDER BY id DESC;5、 ... and 、SELECT Statement other optimization
(1) Avoid select *
First ,select * Operation is not a good one in any type of database SQL Writing habits .
Use select * Take out all the columns , It will make the optimizer unable to complete the optimization of index coverage scanning , It will affect the optimizer's choice of execution plan , It will also increase network bandwidth consumption , It will bring extra I/O, Memory and CPU Consume .
It is suggested that the actual number of columns required by the business , The column name will be specified in place of select *.
(2) Avoid functions with uncertain results
It is specific to business scenarios such as master-slave replication . In principle, the copied statements from the master database are executed by the master database , Use as now()、rand()、sysdate()、current_user() It is easy to cause the data inconsistency between the master database and the slave database . Another function of uncertainty , Produced SQL The statement cannot be used query cache.
(3) When multi table associated query , The watch is in front of , The big watch is at the back .
stay MySQL in , perform from After the table Association query is executed from left to right (Oracle contrary ), The first table will involve a full table scan , So put the watch on the front , Scan the watch first , Scanning is fast and efficient , After scanning the big watch , Maybe just scan the front of the big watch 100 OK, it will meet the return criteria and return 了 .
for example : surface 1 Yes 50 Data , surface 2 Yes 30 Billion data ; If the full table scans the table 2, Your products , Your delicacies , Let's go and have a meal first, right .
(4) Use the alias of the table
When in SQL When multiple tables are joined in a statement , Use the alias of the table and prefix the alias on each column name . In this way, the parsing time can be reduced and the syntax errors caused by the ambiguity of the companion names can be reduced .
(5) use where Sentence replacement HAVING Words and expressions
Avoid using HAVING Words and expressions , because HAVING The result set is filtered only after all records are retrieved , and where We swipe records before aggregation , If you can pass where Words limit the number of records , That would reduce the cost of this .HAVING The conditions in are generally used to filter aggregate functions , besides , The conditions should be written in where In words .
where and having The difference between :where Group functions cannot be used later
(6) adjustment Where The order of connection in a sentence
MySQL Use left to right , Top down order analysis where Clause . According to this principle , The condition of filtering more data should be put forward , The fastest way to reduce the result set .
6、 ... and 、 Additions and deletions DML Statements to optimize
(1) Mass insert data
If a large number of inserts are performed at the same time , It is recommended to use the INSERT sentence ( Method 2 ). It's better than using separate INSERT Fast sentence ( Method 1 ), In general, the efficiency of batch insertion is several times different .
Method 1 :
insert into T values(1,2);
insert into T values(1,3);
insert into T values(1,4);Method 2 :
Insert into T values(1,2),(1,3),(1,4);There are three reasons for choosing the latter method .
- Reduce SQL Operation of statement parsing ,MySQL There's nothing like Oracle Of share pool, Use method 2 , It only needs to be parsed once to insert data ;
- In a particular scenario, you can reduce the amount of DB Number of connections
- SQL Short sentences , Can reduce network transmission IO.
(2) Appropriate use of commit
Appropriate use of commit It can release the resources occupied by the transaction and reduce the consumption ,commit The resources that can be released later are as follows :
- Occupied by business undo Data blocks ;
- Transaction in redo log Data block recorded in ;
- Release the... Imposed by the transaction , Reducing lock contention affects performance . Especially when you need to use delete When deleting a lot of data , The deletion amount must be broken down and commit.
(3) Avoid duplicate queries for updated data
For the need to update the line frequently in the business and to obtain the information of the line ,MySQL Does not support PostgreSQL Like that UPDATE RETURNING grammar , stay MySQL Can be implemented by variables in .
for example , Update the timestamp of a row of records , At the same time, you want to query the timestamp stored in the current record , Simple way to achieve :
Update t1 set time=now() where col1=1;
Select time from t1 where id =1; Using variables , It can be rewritten as follows :
Update t1 set time=now () where col1=1 and @now: = now ();
Select @now; Both need to go back and forth twice , But using variables avoids accessing the data table again , Especially when t1 When there is a large amount of data in the table , The latter is much faster than the former .
(4) Query priority or update (insert、update、delete) first
MySQL It also allows you to change the priority of statement scheduling , It enables better collaboration of queries from multiple clients , In this way, a single client will not wait for a long time due to locking . Changing priorities also ensures that certain types of queries are processed faster . We should first determine the type of application , Judge whether the application is query based or update based , Ensure query efficiency or update efficiency , Decide whether to query first or update first .
The method we mentioned below to change the scheduling strategy is mainly aimed at the storage engine with only table locks , such as MyISAM 、MEMROY、MERGE, about Innodb Storage engine , The execution of the statement is determined by the order in which the row locks are acquired .MySQL The default scheduling strategies of are summarized as follows :
1) Write operations take precedence over read operations .
2) A write operation to a data table can only occur once at a certain time , Write requests are processed in the order they arrive .
3) Multiple read operations on a data table can be performed simultaneously .MySQL Several statement modifiers are provided , Allows you to modify its scheduling strategy :
- LOW_PRIORITY Keywords are applied to DELETE、INSERT、LOAD DATA、REPLACE and UPDATE;
- HIGH_PRIORITY Keywords are applied to SELECT and INSERT sentence ;
- DELAYED Keywords are applied to INSERT and REPLACE sentence .
If the write operation is a LOW_PRIORITY( Low priority ) request , Then the system will not think that its priority is higher than the read operation . under these circumstances , If the writer is waiting , The second reader arrived , Then allow the second reader to plug in before the writer . Only when there are no other readers , The writer is allowed to start the operation . This scheduling modification may exist LOW_PRIORITY Write operations are always blocked .
SELECT Of the query HIGH_PRIORITY( High priority ) Keywords are similar . It allows the SELECT Before inserting a pending write operation , Even if the priority of the write operation is higher under normal circumstances . Another effect is , High priority SELECT In the normal SELECT Execute before statement , Because these statements are blocked by write operations . If you want all the support LOW_PRIORITY The statements of the option are processed by default according to the low priority , that Please use --low-priority-updates Option to start the server . By using INSERTHIGH_PRIORITY Come and take INSERT Statement to normal write priority , You can eliminate this option for individual INSERT Statement impact .
7、 ... and 、 Query condition optimization
(1) For complex queries , You can use intermediate temporary tables Temporary data
(2) Optimize group by sentence
By default ,MySQL Would be right GROUP BY All values of the group are sorted , Such as “GROUP BY col1,col2,....;” The query method is like specifying “ORDER BY col1,col2,...;” If you explicitly include an ORDER BY Clause ,MySQL It can be optimized without slowing down , Even though it's still sorted .
therefore , If the query includes GROUP BY But you don't want to sort the values of the groups , You can specify ORDER BY NULL No sorting . for example :
SELECT col1, col2, COUNT(*) FROM table GROUP BY col1, col2 ORDER BY NULL ;(3) Optimize join sentence
MySQL Can be used through subquery SELECT Statement to create a single column query result , Then use this result as a filter in another query . Using subquery can complete many logical steps at once SQL operation , At the same time, transaction or table lock can be avoided , And it's easy to write . however , In some cases , Subqueries can be connected more efficiently (JOIN).. replace .
Example : Suppose you want to retrieve all users without order records , You can use the following query to complete :
SELECT col1 FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo )If connection is used (JOIN).. To complete this query , The speed will increase . Especially when salesinfo The table is right CustomerID If there is an index , Performance will be better , Enquiries are as follows :
SELECT col1 FROM customerinfo
LEFT JOIN salesinfoON customerinfo.CustomerID=salesinfo.CustomerID
WHERE salesinfo.CustomerID IS NULL Connect (JOIN).. Why it's more efficient , Because MySQL You don't need to create a temporary table in memory to complete this logical two-step query .
(4) Optimize union Inquire about
MySQL Execute... By creating and populating temporary tables union Inquire about . Unless you really want to eliminate duplicate lines , Otherwise, it is recommended to use union all. The reason is that if there is no all Keyword ,MySQL We will add a temporary table distinct Options , This will result in uniqueness checking of the data of the whole temporary table , The cost of doing so is quite high .
Efficient :
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10
UNION ALL
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST'; Inefficient :
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10
UNION
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST';(5) It's complicated to split SQL For the multiple small SQL, Avoid big business
- ordinary SQL Easy to use MySQL Of QUERY CACHE;
- Reduce the lock time, especially the use of MyISAM The table that stores the engine ;
- Can use multi-core CPU.
(6) Use truncate Instead of delete
When deleting records in the whole table , Use delete The operation of the statement will be recorded to undo In block , Deleting records also records binlog, When you confirm that you want to delete the entire table , There will be a lot of binlog And take up a lot of undo Data blocks , At this time, it is not very efficient and takes up a lot of resources .
Use truncate replace , Recoverable information will not be recorded , Data cannot be recovered . So I use truncate Operation has very little resource occupation and very fast time . in addition , Use truncate Can recover the water level of the meter , Reset the value of the auto increment field to zero .
(7) Use reasonable paging methods to improve paging efficiency
Use a reasonable paging method to improve paging efficiency for presentation and other paging requirements , Proper paging can improve the efficiency of paging .
Case study 1:
select * from t where thread_id = 10000 and deleted = 0
order by gmt_create asc limit 0, 15;In the above example, all fields are taken out according to the filter conditions at one time and sorted back . Data access overhead = Indexes IO+ Index the table data corresponding to all the records IO. therefore , This kind of writing method turns to the back and the execution efficiency is worse , The longer the time , Especially when there is a large amount of table data .
Applicable scenario : When the intermediate result set is small (10000 Below row ) Or complex query conditions ( It refers to multiple query fields or multiple table connections ) When applicable .
Case study 2:
select t.* from (select id from t where thread_id = 10000 and deleted = 0
order by gmt_create asc limit 0, 15) a, t
where a.id = t.id; The above example must satisfy t The primary key of the table is id Column , And there are coverage indexes secondary key:(thread_id, deleted, gmt_create). Get the primary key by first using the coverage index according to the filter conditions id Sort , Proceed again join Operation to take out other fields . Data access overhead = Indexes IO+ Index paging results ( In the case of 15 That's ok ) Corresponding table data IO. therefore , The resources and time consumed by this method are basically the same for each page turning , It's like turning the first page .
Applicable scenario : When querying and sorting fields ( namely where Clause and order by The fields involved in clause ) When there is a corresponding overlay index , And when the intermediate result set is large, it is applicable .
5、 ... and 、 Table building optimization
(1) Index a table , Give priority to where、order by Fields used .
(2) Try to use numeric fields ( Such as gender , male :1 Woman :2), If only contains the numerical value information the field as far as possible does not design for the character type , This reduces the performance of queries and connections , And it increases storage overhead .
This is because the engine compares each character in the string one by one when it processes queries and connections , For digital models, only one comparison is enough .
(3) Querying tables with large amounts of data will cause slow queries . The main reason is that there are too many scan lines . It's time to go through the program , Page by page , Loop traversal , Combine the results for presentation . To query 100000 To 100050 The data of , as follows :
SELECT * FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID ASC) AS rowid,*
FROM infoTab)t WHERE t.rowid > 100000 AND t.rowid <= 100050
(4) use varchar/nvarchar Instead of char/nchar
Use as much as possible varchar/nvarchar Instead of char/nchar , Because first of all the longer fields have less storage space , You can save storage space , Second, for queries , Searching in a relatively small field is obviously more efficient .
Don't assume that NULL There is no need for space , such as :char(100) type , When the field is created , The space is fixed , Whether or not a value is inserted (NULL It's also included ), It's all occupation 100 Character space , If it is varchar Such a variable length field , null It doesn't take up space .
边栏推荐
- Construction of esp8266/esp32 development environment
- [激光器原理与应用-2]:国内激光器重点品牌
- Several time synchronization methods of Beidou timing system (GPS timing equipment)
- NTP time server (GPS Beidou satellite synchronous clock) application boiler monitoring system
- Guanghetong 5g module shines brightly and has won the "2021 science and technology award of China Electronics Society"
- L'avenir est venu, l'ère 5G - Advanced s'ouvre
- 域名解析耗时是什么?域名解析耗时影响因素有哪些?
- Market prospect analysis and Research Report of welding laser in 2022
- JVM(2):内存结构、类的加载过程
- PHP regular use case
猜你喜欢

A Security Analysis Of Browser Extensions

Analysis of zero time technology | discover lightning loan attack

Eth Transfer

L'avenir est venu, l'ère 5G - Advanced s'ouvre

SSLStrip 终极版 —— location 劫持

It's 2022. When will the "module freedom" be realized?

JVM(4):类的主动使用与被动使用、运行时数据区域内部结构、JVM线程说明、PC寄存器

Esp32 development -lvgl display picture

JVM(5):虚拟机栈、栈异常、栈的存储结果和运行原理、栈内部结构、局部变量表

Some differences between people
随机推荐
众昂矿业:氟化工是萤石的主要消耗领域
Market prospect analysis and Research Report of pipe and hose press fitting tools in 2022
Programming battle -- challenging college entrance examination questions
Embedded basic interface-i2s
Evil CSRF
Guanghetong LTE Cat4 module l716 is upgraded to provide affordable and universal wireless applications for the IOT industry
Rational use of thread pool and thread variables
密码找回功能可能存在的问题(补充)
Red team shooting range with three-layer protection
雷达辐射源调制信号仿真
Esp32 development -lvgl uses internal and external fonts
Vulkan-官方示例解读-RayTracingShadows&在这里边使用模型(1)
[laser principle and application-2]: key domestic laser brands
Ultra simple cameraX face recognition effect package
2022 love analysis · privacy computing vendor panoramic report | love Analysis Report
7. list label
FreeRTOS startup - based on stm32
How does the NTP clock server (satellite clock system) coordinate the integrated system?
[激光器原理与应用-2]:国内激光器重点品牌
Google 有哪些牛逼的开源项目?