当前位置：网站首页>SQL optimization

SQL optimization

2022-06-11 04:16:00 【Prince you】

One 、 Preface

SQL Optimization in improving system performance is ：（ Lowest cost && The optimization effect is the most obvious ） Way . If your team is in SQL We have done a good job in this area , It's a qualitative leap forward in terms of usability for your entire large system , It really saves your boss more than a couple of bucks .

Optimized cost ： Hardware > The system configuration > Database table structure >SQL And index .
The optimization effect ： Hardware < The system configuration < Database table structure <SQL And index .

First , about MySQL Layer optimization generally follows five principles ：

Reduce data access ： Set a reasonable field type , Enable compression , Reduce disks by index access, etc IO
Return less data ： Only return the required fields and data pagination processing Reduce disk io And the Internet io
Reduce the number of interactions ： Batch DML operation , Function storage to reduce the number of data connections
Reduce servers CPU expenses ： Minimize database sorting operations and full table queries , Reduce cpu Memory footprint
Use more resources ： Use table partitioning , Parallel operations can be added , Make greater use of cpu resources

To sum up SQL Optimization , Just three o'clock ：

Maximize the use of indexes ;
Avoid full table scanning as much as possible ;
Reduce invalid data queries ;

Two 、SELECT sentence - Grammatical order

1. SELECT 
2. DISTINCT <select_list>
3. FROM <left_table>
4. <join_type> JOIN <right_table>
5. ON <join_condition>
6. WHERE <where_condition>
7. GROUP BY <group_by_list>
8. HAVING <having_condition>
9. ORDER BY <order_by_condition>
10.LIMIT <limit_number>

3、 ... and 、SELECT sentence - Execution order

FROM：< Table name > # Selection table , The data of multiple tables is transformed into one table through Cartesian product .
ON：< filter > # Filter the virtual table of Cartesian product
JOIN： <join, left join, right join...><join surface > # Appoint join, Used to add data to on In the virtual table after , for example left join The remaining data from the left table will be added to the virtual table
WHERE：<where Conditions > # Filter the above virtual table
GROUP BY：< Grouping conditions > # grouping <SUM() Wait for the aggregate function > # be used for having Clause to judge , In writing, this kind of aggregate function is written in having Judge what's inside
HAVING：< Group screening > # Aggregate and filter the results after grouping
SELECT：< Back to the data list > # The returned single column must be in group by clause , Except aggregate functions
DISTINCT：# Data De duplication
ORDER BY：< Sorting conditions > # Sort
LIMIT：< Row limit >

Four 、SQL Optimization strategy

Statement ： following SQL The optimization strategy is suitable for large amount of data , If the amount of data is small , There's no need for that to prevail , So as not to add insult to injury .

（1） Try to avoid fuzzy queries at the beginning of fields , Will cause the database engine to abandon the index for full table scanning . as follows ：

SELECT * FROM t WHERE username LIKE '% Chen %'

How to optimize ： Try to use fuzzy queries after fields . as follows ：

SELECT * FROM t WHERE username LIKE ' Chen %'

If the requirement is to use fuzzy queries before ,

Use MySQL Built in functions INSTR(str,substr) To match , Function like java Medium indexOf(), Query the corner mark position of the string
Use FullText Full-text index , use match against retrieval
When there is a large amount of data , It is suggested that ElasticSearch、solr, Hundred million data, retrieval speed second level
When the amount of table data is small （ Thousands of them ）, Don't make such a fuss , Direct use like '%xx%'.

（2） Avoid using in and not in, It will cause the engine to scan the whole table . as follows ：

SELECT * FROM t WHERE id IN (2,3)

How to optimize ： If it's a continuous number , It can be used between Instead of . as follows ：

SELECT * FROM t WHERE id BETWEEN 2 AND 3

If it's a subquery , It can be used exists Instead of . as follows ：

--  Don't walk index 
select * from A where A.id in (select id from B);
--  Go to the index 
select * from A where exists (select * from B where B.id = A.id);

（3） Avoid using or, Will cause the database engine to abandon the index for full table scanning . as follows ：

SELECT * FROM t WHERE id = 1 OR id = 3

How to optimize ： It can be used union Instead of or. as follows ：

SELECT * FROM t WHERE id = 1
   UNION
SELECT * FROM t WHERE id = 3

（4） Try to avoid null Value judgment , Will cause the database engine to abandon the index for full table scanning . as follows ：

SELECT * FROM t WHERE score IS NULL

How to optimize ： You can add default values to fields 0, Yes 0 Value for judgment . as follows ：

SELECT * FROM t WHERE score = 0

（5） Try to avoid where The left side of the equal sign in the condition 、 Function operation , Will cause the database engine to abandon the index for full table scanning .

You can put the expression 、 The function operation moves to the right of the equal sign . as follows ：

--  Full table scan 
SELECT * FROM T WHERE score/10 = 9
--  Go to the index 
SELECT * FROM T WHERE score = 10*9

（6） When the amount of data is large , Avoid using where 1=1 Conditions .

Usually for the convenience of assembling query conditions , We will use this condition by default , The database engine will abandon the index for full table scanning . as follows ：

SELECT username, age, sex FROM T WHERE 1=1

How to optimize ： Assemble with code sql Judge when , no where The conditions are removed where, Yes where On condition that and.

（7） Query conditions cannot be used <> perhaps !=

When using index columns as criteria for queries , Need to avoid using <> perhaps != And so on . If the business needs , Use not equal to sign , Need to re evaluate the index to build , Avoid indexing this field , It is replaced by other index fields in the query criteria .

（8）where Condition contains only compound index non leading columns

as follows ： Reunite with （ union ） The index contains key_part1,key_part2,key_part3 The three column , but SQL The statement does not contain an index precolumn "key_part1", according to MySQL The leftmost matching principle of union index , It's not going to be a union index .

select col1 from table where key_part2=1 and key_part3=2

（9） Implicit type conversion causes index not to be used

as follows SQL Statement because the index pair column type is varchar, But the value given is a number , Involving implicit type conversion , Cause the index can't go right .

select col1 from table where col_varchar=123;

（10）order by The conditions should be the same as where The conditions are consistent , otherwise order by No sorting by index

--  Don't go age Indexes 
SELECT * FROM t order by age;
 
--  go age Indexes 
SELECT * FROM t where age > 0 order by age;

For the above statement , The processing order of the database is ：

First step ： according to where Conditions and statistics generate execution plans , Get data .
The second step ： Sort the data you get . When execution processes data （order by） when , The database will first look at the implementation plan of the first step , see order by Whether the field of the execution plan uses the index . If it is , You can use the index order to get the ordered data directly . If not , Then the sorting operation is performed again .
The third step ： Return sorted data .

When order by The fields in appear in where When the condition is medium , Will use the index instead of the secondary sort , More precisely ,order by When the fields in the execution plan take advantage of the index , No sorting operations .

This conclusion is not only true of order by It works , It also works for other operations that need sorting . such as group by 、union 、distinct etc. .

（11） Use... Correctly hint（ Tips ） Optimize statements

MySQL Can be used in hint Specifies that the optimizer selects or ignores specific indexes at execution time . generally speaking , Changes in table structure index due to version changes , It is better to avoid using hint, But through Analyze table Collect more statistics . But on certain occasions , Appoint hint You can exclude other index interference and specify a better execution plan .

USE INDEX After the table name in your query statement , add to USE INDEX To provide hope MySQL To refer to the index list , You can make MySQL No longer consider other available indexes . Example : SELECT col1 FROM table USE INDEX (mod_time, name)...
IGNORE INDEX If you just want to let MySQL Ignore one or more indexes , have access to IGNORE INDEX As Hint. Example : SELECT col1 FROM table IGNORE INDEX (priority) ...
FORCE INDEX Is mandatory MySQL Use a specific index , It can be used in query FORCE INDEX As Hint. Example : SELECT col1 FROM table FORCE INDEX (mod_time) ...

At query time , The database system will analyze the query statements automatically , And choose the most appropriate index . But a lot of times , The query optimizer of a database system may not always be able to use the optimal index . If we know how to choose an index , have access to FORCE INDEX Force the query to use the specified index .

for example ：

SELECT * FROM students FORCE INDEX (idx_class_id) WHERE class_id = 1 ORDER BY id DESC;

5、 ... and 、SELECT Statement other optimization

（1） Avoid select *

First ,select * Operation is not a good one in any type of database SQL Writing habits .

Use select * Take out all the columns , It will make the optimizer unable to complete the optimization of index coverage scanning , It will affect the optimizer's choice of execution plan , It will also increase network bandwidth consumption , It will bring extra I/O, Memory and CPU Consume .

It is suggested that the actual number of columns required by the business , The column name will be specified in place of select *.

（2） Avoid functions with uncertain results

It is specific to business scenarios such as master-slave replication . In principle, the copied statements from the master database are executed by the master database , Use as now()、rand()、sysdate()、current_user() It is easy to cause the data inconsistency between the master database and the slave database . Another function of uncertainty , Produced SQL The statement cannot be used query cache.

（3） When multi table associated query , The watch is in front of , The big watch is at the back .

stay MySQL in , perform from After the table Association query is executed from left to right （Oracle contrary ）, The first table will involve a full table scan , So put the watch on the front , Scan the watch first , Scanning is fast and efficient , After scanning the big watch , Maybe just scan the front of the big watch 100 OK, it will meet the return criteria and return 了 .

for example ： surface 1 Yes 50 Data , surface 2 Yes 30 Billion data ; If the full table scans the table 2, Your products , Your delicacies , Let's go and have a meal first, right .

（4） Use the alias of the table

When in SQL When multiple tables are joined in a statement , Use the alias of the table and prefix the alias on each column name . In this way, the parsing time can be reduced and the syntax errors caused by the ambiguity of the companion names can be reduced .

（5） use where Sentence replacement HAVING Words and expressions

Avoid using HAVING Words and expressions , because HAVING The result set is filtered only after all records are retrieved , and where We swipe records before aggregation , If you can pass where Words limit the number of records , That would reduce the cost of this .HAVING The conditions in are generally used to filter aggregate functions , besides , The conditions should be written in where In words .

where and having The difference between ：where Group functions cannot be used later

（6） adjustment Where The order of connection in a sentence

MySQL Use left to right , Top down order analysis where Clause . According to this principle , The condition of filtering more data should be put forward , The fastest way to reduce the result set .

6、 ... and 、 Additions and deletions DML Statements to optimize

（1） Mass insert data

If a large number of inserts are performed at the same time , It is recommended to use the INSERT sentence ( Method 2 ). It's better than using separate INSERT Fast sentence （ Method 1 ）, In general, the efficiency of batch insertion is several times different .

Method 1 ：

insert into T values(1,2); 
insert into T values(1,3); 
insert into T values(1,4);

Method 2 ：

Insert into T values(1,2),(1,3),(1,4);

There are three reasons for choosing the latter method .

Reduce SQL Operation of statement parsing ,MySQL There's nothing like Oracle Of share pool, Use method 2 , It only needs to be parsed once to insert data ;
In a particular scenario, you can reduce the amount of DB Number of connections
SQL Short sentences , Can reduce network transmission IO.

（2） Appropriate use of commit

Appropriate use of commit It can release the resources occupied by the transaction and reduce the consumption ,commit The resources that can be released later are as follows ：

Occupied by business undo Data blocks ;
Transaction in redo log Data block recorded in ;
Release the... Imposed by the transaction , Reducing lock contention affects performance . Especially when you need to use delete When deleting a lot of data , The deletion amount must be broken down and commit.

（3） Avoid duplicate queries for updated data

For the need to update the line frequently in the business and to obtain the information of the line ,MySQL Does not support PostgreSQL Like that UPDATE RETURNING grammar , stay MySQL Can be implemented by variables in .

for example , Update the timestamp of a row of records , At the same time, you want to query the timestamp stored in the current record , Simple way to achieve ：

Update t1 set time=now() where col1=1; 
Select time from t1 where id =1;

Using variables , It can be rewritten as follows ：

Update t1 set time=now () where col1=1 and @now: = now (); 
Select @now;

Both need to go back and forth twice , But using variables avoids accessing the data table again , Especially when t1 When there is a large amount of data in the table , The latter is much faster than the former .

（4） Query priority or update （insert、update、delete） first

MySQL It also allows you to change the priority of statement scheduling , It enables better collaboration of queries from multiple clients , In this way, a single client will not wait for a long time due to locking . Changing priorities also ensures that certain types of queries are processed faster . We should first determine the type of application , Judge whether the application is query based or update based , Ensure query efficiency or update efficiency , Decide whether to query first or update first .

The method we mentioned below to change the scheduling strategy is mainly aimed at the storage engine with only table locks , such as MyISAM 、MEMROY、MERGE, about Innodb Storage engine , The execution of the statement is determined by the order in which the row locks are acquired .MySQL The default scheduling strategies of are summarized as follows ：

1） Write operations take precedence over read operations .

2） A write operation to a data table can only occur once at a certain time , Write requests are processed in the order they arrive .

3） Multiple read operations on a data table can be performed simultaneously .MySQL Several statement modifiers are provided , Allows you to modify its scheduling strategy ：

LOW_PRIORITY Keywords are applied to DELETE、INSERT、LOAD DATA、REPLACE and UPDATE;
HIGH_PRIORITY Keywords are applied to SELECT and INSERT sentence ;
DELAYED Keywords are applied to INSERT and REPLACE sentence .

If the write operation is a LOW_PRIORITY（ Low priority ） request , Then the system will not think that its priority is higher than the read operation . under these circumstances , If the writer is waiting , The second reader arrived , Then allow the second reader to plug in before the writer . Only when there are no other readers , The writer is allowed to start the operation . This scheduling modification may exist LOW_PRIORITY Write operations are always blocked .

SELECT Of the query HIGH_PRIORITY（ High priority ） Keywords are similar . It allows the SELECT Before inserting a pending write operation , Even if the priority of the write operation is higher under normal circumstances . Another effect is , High priority SELECT In the normal SELECT Execute before statement , Because these statements are blocked by write operations . If you want all the support LOW_PRIORITY The statements of the option are processed by default according to the low priority , that Please use --low-priority-updates Option to start the server . By using INSERTHIGH_PRIORITY Come and take INSERT Statement to normal write priority , You can eliminate this option for individual INSERT Statement impact .

7、 ... and 、 Query condition optimization

（1） For complex queries , You can use intermediate temporary tables Temporary data

（2） Optimize group by sentence

By default ,MySQL Would be right GROUP BY All values of the group are sorted , Such as “GROUP BY col1,col2,....;” The query method is like specifying “ORDER BY col1,col2,...;” If you explicitly include an ORDER BY Clause ,MySQL It can be optimized without slowing down , Even though it's still sorted .

therefore , If the query includes GROUP BY But you don't want to sort the values of the groups , You can specify ORDER BY NULL No sorting . for example ：

SELECT col1, col2, COUNT(*) FROM table GROUP BY col1, col2 ORDER BY NULL ;

（3） Optimize join sentence

MySQL Can be used through subquery SELECT Statement to create a single column query result , Then use this result as a filter in another query . Using subquery can complete many logical steps at once SQL operation , At the same time, transaction or table lock can be avoided , And it's easy to write . however , In some cases , Subqueries can be connected more efficiently (JOIN).. replace .

Example ： Suppose you want to retrieve all users without order records , You can use the following query to complete ：

SELECT col1 FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo )

If connection is used (JOIN).. To complete this query , The speed will increase . Especially when salesinfo The table is right CustomerID If there is an index , Performance will be better , Enquiries are as follows ：

SELECT col1 FROM customerinfo 
   LEFT JOIN salesinfoON customerinfo.CustomerID=salesinfo.CustomerID 
      WHERE salesinfo.CustomerID IS NULL

Connect (JOIN).. Why it's more efficient , Because MySQL You don't need to create a temporary table in memory to complete this logical two-step query .

（4） Optimize union Inquire about

MySQL Execute... By creating and populating temporary tables union Inquire about . Unless you really want to eliminate duplicate lines , Otherwise, it is recommended to use union all. The reason is that if there is no all Keyword ,MySQL We will add a temporary table distinct Options , This will result in uniqueness checking of the data of the whole temporary table , The cost of doing so is quite high .

Efficient ：

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 
UNION ALL 
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST';

Inefficient ：

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 
UNION 
SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST';

（5） It's complicated to split SQL For the multiple small SQL, Avoid big business

ordinary SQL Easy to use MySQL Of QUERY CACHE;
Reduce the lock time, especially the use of MyISAM The table that stores the engine ;
Can use multi-core CPU.

（6） Use truncate Instead of delete

When deleting records in the whole table , Use delete The operation of the statement will be recorded to undo In block , Deleting records also records binlog, When you confirm that you want to delete the entire table , There will be a lot of binlog And take up a lot of undo Data blocks , At this time, it is not very efficient and takes up a lot of resources .

Use truncate replace , Recoverable information will not be recorded , Data cannot be recovered . So I use truncate Operation has very little resource occupation and very fast time . in addition , Use truncate Can recover the water level of the meter , Reset the value of the auto increment field to zero .

（7） Use reasonable paging methods to improve paging efficiency

Use a reasonable paging method to improve paging efficiency for presentation and other paging requirements , Proper paging can improve the efficiency of paging .

Case study 1：

select * from t where thread_id = 10000 and deleted = 0 
   order by gmt_create asc limit 0, 15;

In the above example, all fields are taken out according to the filter conditions at one time and sorted back . Data access overhead = Indexes IO+ Index the table data corresponding to all the records IO. therefore , This kind of writing method turns to the back and the execution efficiency is worse , The longer the time , Especially when there is a large amount of table data .

Applicable scenario ： When the intermediate result set is small （10000 Below row ） Or complex query conditions （ It refers to multiple query fields or multiple table connections ） When applicable .

Case study 2：

select t.* from (select id from t where thread_id = 10000 and deleted = 0
   order by gmt_create asc limit 0, 15) a, t 
      where a.id = t.id;

The above example must satisfy t The primary key of the table is id Column , And there are coverage indexes secondary key:(thread_id, deleted, gmt_create). Get the primary key by first using the coverage index according to the filter conditions id Sort , Proceed again join Operation to take out other fields . Data access overhead = Indexes IO+ Index paging results （ In the case of 15 That's ok ） Corresponding table data IO. therefore , The resources and time consumed by this method are basically the same for each page turning , It's like turning the first page .

Applicable scenario ： When querying and sorting fields （ namely where Clause and order by The fields involved in clause ） When there is a corresponding overlay index , And when the intermediate result set is large, it is applicable .

5、 ... and 、 Table building optimization

（1） Index a table , Give priority to where、order by Fields used .

（2） Try to use numeric fields （ Such as gender , male ：1 Woman ：2）, If only contains the numerical value information the field as far as possible does not design for the character type , This reduces the performance of queries and connections , And it increases storage overhead .

This is because the engine compares each character in the string one by one when it processes queries and connections , For digital models, only one comparison is enough .

（3） Querying tables with large amounts of data will cause slow queries . The main reason is that there are too many scan lines . It's time to go through the program , Page by page , Loop traversal , Combine the results for presentation . To query 100000 To 100050 The data of , as follows ：

SELECT * FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID ASC) AS rowid,* 
   FROM infoTab)t WHERE t.rowid > 100000 AND t.rowid <= 100050

（4） use varchar/nvarchar Instead of char/nchar

Use as much as possible varchar/nvarchar Instead of char/nchar , Because first of all the longer fields have less storage space , You can save storage space , Second, for queries , Searching in a relatively small field is obviously more efficient .

Don't assume that NULL There is no need for space , such as ：char(100) type , When the field is created , The space is fixed , Whether or not a value is inserted （NULL It's also included ）, It's all occupation 100 Character space , If it is varchar Such a variable length field , null It doesn't take up space .

原网站

版权声明
本文为[Prince you]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206110405099650.html