当前位置：网站首页>Actual combat! Talk about how to solve the deep paging problem of MySQL

Actual combat! Talk about how to solve the deep paging problem of MySQL

2022-07-29 06:51:00 【Ma Xiaoxiao】

When we do paging requirements everyday , Tend to use limit Realization , But when the offset is particularly large , Query efficiency becomes low . This paper will be divided into four schemes , Discuss how to optimize MySQL Deep paging of millions of data , The latest optimized production plan is attached SQL The actual combat cases of .

limit Why does deep paging slow down ？

First look at the structure of the table below ：

CREATE TABLE account (
  id int(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key Id',
  name varchar(255) DEFAULT NULL COMMENT ' Account name ',
  balance int(11) DEFAULT NULL COMMENT ' balance ',
  create_time datetime NOT NULL COMMENT ' Creation time ',
  update_time datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT ' Update time ',
  PRIMARY KEY (id),
  KEY idx_name (name),
  KEY idx_update_time (update_time) // Indexes 
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT=' Account form ';

Suppose the execution of deep paging SQL as follows ：

select id,name,balance from account where update_time> '2020-09-19' limit 100000,10;

This SQL The execution time is as follows ：

It takes 0.742 second , Why deep paging Slow down Well ？ If replaced limit 0,10, It only needs 0.006 Seconds

Let's take a look at this SQL The implementation process of ：

adopt Ordinary secondary index tree idx_update_time, Filter update_time Conditions , Find records that meet the criteria ID.
adopt ID, go back to Primary key index tree , Find the line that meets the record , Then take out the displayed Columns （ Back to the table ）
Scan the ones that meet the conditions 100010 That's ok , Then throw it away before 100000 That's ok , return .

Implementation plan as follows ：

SQL There are two reasons for slowing down ：

limit Statement will scan offset+n That's ok , And then throw it away offset That's ok , After the return n Row data . in other words limit 100000,10, It will scan 100010 That's ok , and limit 0,10, Just scan 10 That's ok .
limit 100000,10 Scan more lines , Also means that Back to the table More times .

Optimize through subqueries

Because of the above SQL, Back to the table 100010 Time , actually , We just need 10 Data , That is, we just need to 10 It's enough to go back to the table once . therefore , We can go through Reduce the number of times to return to the table To optimize .

review B+ Tree structure

that , How to reduce the number of times to return to the table ？ Let's review first B+ Tree index structure ~

InnoDB in , The index is divided into primary key indexes （ Cluster index ） And secondary index

primary key , The leaf node stores the whole line of data
Secondary indexes , Leaf nodes store The value of the primary key .

Transfer the condition to the primary key index tree

If we put the query criteria , Move back to the primary key index tree , Then you can reduce the number of times to return to the meter . If the query is transferred to the primary key index tree , The query criteria have to be changed to Primary key id 了 , Before SQL Of update_time What about these conditions ？ Draw to Subquery There ~

How did you get it from the sub query ？ Because the secondary index leaf node has a primary key ID Of , So we're directly based on update_time Let's check the primary key ID that will do , At the same time, we put limit 100000 Conditions , Also transfer to subquery , complete SQL as follows ：

select id,name,balance FROM account where id >= (select a.id from account a where a.update_time >= '2020-09-19' limit 100000, 1) LIMIT 10; It's missing , You can make up for the time and conditions outside

The query effect is the same , Execution time only needs 0.038 second ！

Let's take a look at the implementation plan

From the execution plan , Subquery table a The query uses idx_update_time Indexes . First, get the primary key of the clustered index on the index ID, The operation of returning to the table is omitted , Then the second query is directly based on the first query ID Check later 10 Just one !

therefore , This plan is OK ~

INNER JOIN Delayed correlation

Optimization idea of delay correlation , The optimization idea is actually the same as that of sub query ： It is to transfer the conditions to the primary key index tree , Then reduce back to table . The difference is , Deferred correlation uses inner join Instead of subquery .

The optimized SQL as follows ：

SELECT  acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.update_time >= '2020-09-19' ORDER BY a.update_time LIMIT 100000, 10) AS  acct2 on acct1.id= acct2.id;

The query effect is also leveraged , It only needs 0.034 second

The execution plan is as follows ：

The query idea is , Through the first idx_update_time The secondary index tree queries the qualified primary keys ID, And then the primary key with the original table ID Internal connection , In this way, the primary key index is directly followed , At the same time, it also reduces the number of back tables .

Label recording method

limit The essential reason for the deep paging problem is ： Offset （offset） The bigger it is ,mysql The more lines you scan , Then abandon . This leads to a decline in query performance .

In fact, we can use Tag records Law , Just mark which one was found last time , Next time we check , Scan down from this bar . It's like reading a Book , Where did you see last time , You just fold it or put a bookmark , Next time I come to see , Just turn to .

Suppose... Was recorded last time 100000, be SQL It can be changed to ：

select  id,name,balance FROM account where id > 100000 order by id limit 10;

In this case , No matter how many pages you turn back , The performance will be very good , Because I hit id Indexes . But this way There are limitations ： You need a field similar to continuous self increment .

Use between...and...

A lot of times , Can be limit Queries are converted to queries of known locations , such MySQL Through range scanning between...and, You can get the corresponding results .

If you know the boundary value is 100000,100010 after , You can optimize ：

select  id,name,balance FROM account where id between 100000 and 100010 order by id;

Hands on combat cases

Let's take a look at a practical case . Suppose there is a table structure as follows , And there are 200 ten thousand data .

CREATE TABLE account (
 id varchar(32) COLLATE utf8_bin NOT NULL COMMENT ' Primary key ',
 account_no varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT ' account number '
 amount decimal(20,2) DEFAULT NULL COMMENT ' amount of money '
 type varchar(10) COLLATE utf8_bin DEFAULT NULL COMMENT ' type A,B'
 create_time datetime DEFAULT NULL COMMENT ' Creation time ',
 update_time datetime DEFAULT NULL COMMENT ' Update time ',
 PRIMARY KEY (id),
 KEY `idx_account_no` (account_no),
 KEY `idx_create_time` (create_time)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT=' Account form '

The business requirements are ： Get the most 2021 Year of A Type account data , Report to big data platform .

The implementation of the general idea

Many partners receive such a demand , It will be realized directly ：

// Query the total reported quantity 
Integer total = accountDAO.countAccount();

// Query the... Corresponding to the total reported quantity SQL
<select id ='countAccount' resultType="java.lang.Integer">
  seelct count(1) 
  from account
  where create_time >='2021-01-01 00:00:00'
  and  type ='A'
</select>

// Count pages 
int pageNo = total % pageSize == 0 ? total / pageSize : (total / pageSize + 1);

// Paging query , Report 
for(int i = 0; i < pageNo; i++){
 List<AcctountPO> list = accountDAO.listAccountByPage(startRow,pageSize);
 startRow = (pageNo-1)*pageSize;
 // Report big data 
 postBigData(list);
}

// Paging query SQL（ Possible limit Deep paging problem , because account There are millions of data in the table ）
<select id ='listAccountByPage' >
  seelct * 
  from account
  where create_time >='2021-01-01 00:00:00'
  and  type ='A'
  limit #{startRow},#{pageSize}
</select>

Practical optimization scheme

The above implementation scheme , Will exist limit Deep paging problem , because account There are millions of data in the table . How to optimize it ？

You can actually use Tag records Law , Some partners may have doubts ,id The primary key is not continuous Yeah , You can really use labels to record ？

Certainly. ,id It's not continuous , We can go through order by Let it continue . The optimization scheme is as follows ：

// Query minimum ID
String  lastId = accountDAO.queryMinId();

// Query minimum ID Corresponding SQL
<select id="queryMinId" returnType=“java.lang.String”>
select MIN(id) 
from account
where create_time >='2021-01-01 00:00:00'
and type ='A'
</select>

// Number of entries on a page 
Integer pageSize = 100;

List<AcctountPO> list ;
do{
   list = listAccountByPage(lastId,pageSize);
   // Label recording method , Record the last query Id
   lastId = list.get(list,size()-1).getId();
    // Report big data 
    postBigData(list);
}while(CollectionUtils.isNotEmpty(list));

<select id ="listAccountByPage">
  select * 
  from account 
  where create_time >='2021-01-01 00:00:00'
  and id > #{lastId}
  and type ='A'
  order by id asc  
  limit #{pageSize}
</select>

原网站

版权声明
本文为[Ma Xiaoxiao]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290522065424.html