当前位置：网站首页>Remember SQL optimization once

Remember SQL optimization once

2022-07-26 03:06:00 【JavaShark】

yesterday （2022-7-22） One of my functions has been launched , The amount of test environment data is small , No big problem , But after production , It's stuck , Then it started like this SQL Optimize , Here is a record of .

It is not convenient to disclose the table structure of the company , Here I have built several tables by myself , Just simulate it .

There must be a bar essence. To say that the watch can not be designed like this , But in fact, this is how the system is designed , If you want to change the table design , The impact is too large （ We are eager to go online ）. Of course , Later in this article, the scheme of modifying the design will also be given , To achieve a better solution .

1. Create table

Purchase order form ：

CREATE TABLE `purchase_order` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' The primary key increases automatically id',
  `purchase_time` varchar(255) DEFAULT NULL COMMENT ' Purchase time ',
  `purchase_pre_unit_price` decimal(10,2) unsigned zerofill NOT NULL COMMENT ' Purchase reservation unit price ( element /kg)',
  `purchase_weight` decimal(10,2) unsigned zerofill NOT NULL COMMENT ' Incoming weight (kg)',
  `purchase_bill_no` varchar(255) NOT NULL COMMENT ' Purchase order No ',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=62181 DEFAULT CHARSET=utf8 COMMENT=' Purchase order ';
 Purchase statement ：

CREATE TABLE `settlement_voucher` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' Self increasing ID',
  `purchase_bill_no` varchar(512) DEFAULT NULL COMMENT ' Purchase order No ',
  `settlement_bill_no` varchar(64) NOT NULL COMMENT ' Settlement doc No ',
  `unit_price` decimal(10,2) unsigned zerofill NOT NULL COMMENT ' Actual settlement unit price （ element /kg）',
  `settlement_weight` decimal(10,2) unsigned zerofill NOT NULL COMMENT ' Actual settlement weight （kg）',
  `cut_off_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT ' Settlement time ',
  PRIMARY KEY (`id`),
  KEY `idx_settlement_bill_no` (`settlement_bill_no`)
) ENGINE=InnoDB AUTO_INCREMENT=63288 DEFAULT CHARSET=utf8 COMMENT=' Purchase statement ';

Invoice form ：

CREATE TABLE `invoice` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key ID',
  `invoice_code` varchar(255) NOT NULL COMMENT ' Invoice code ',
  `invoice_number` varchar(255) NOT NULL COMMENT ' Invoice number ',
  `pay_amount` decimal(10,2) DEFAULT NULL COMMENT ' Invoice amount ',
  PRIMARY KEY (`id`),
  KEY `idx_invoice_number` (`invoice_number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' Invoice form ';

invoice - Statement related table ：

CREATE TABLE `settlement_invoice_relation` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key ID',
  `invoice_code` varchar(255) DEFAULT NULL COMMENT ' Invoice code ',
  `invoice_number` varchar(255) DEFAULT NULL COMMENT ' Invoice number ',
  `settlement_bill_no` varchar(64) DEFAULT NULL COMMENT ' Settlement doc No ',
  PRIMARY KEY (`id`),
  KEY `idx_settlement_bill_no` (`settlement_bill_no`),
  KEY `idx_invoice_number` (`invoice_number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' invoice - Statement related table ';

These are the tables I created myself , First, let's introduce the relationship between these tables ：

Purchase order form ( purchase_order ) And purchase statement ( settlement_voucher ) Through the purchase order number ( purchase_bill_no ) relation . Here's what's interesting ：
- One purchase order can correspond to multiple purchase settlement documents , adopt purchase_bill_no relation , As shown in the following data ：
- One purchase settlement document can correspond to multiple purchase orders , adopt purchase_bill_no relation , settlement_voucher In the table purchase_bill_no The field stores multiple purchase orders , Use English commas to separate . As shown in the following data ：
Invoice form ( invoice ) And statement of accounts ( settlement_voucher ) There is an association table ( settlement_invoice_relation )
- Invoice table and association table use invoice_code and invoice_number relation
- The settlement document table and association table use settlement_bill_no relation
- Invoices and settlement documents are many to many

2. demand

Now you need to use the purchase statement ( settlement_voucher ) Query out a list ：

The list field has 【 Purchase time ( Multiple commas are used to separate )、 Average purchase unit price 、 Total scheduled purchase amount , Settlement doc No , Average unit price of settlement , Settlement amount , Settlement time , Invoice number ( Multiple commas are used to separate ), Invoice code ( Multiple commas are used to separate )】
The query conditions are ： Purchase time ( When one purchase settlement document corresponds to multiple purchase documents , As long as there is a purchase order within the time range , Just find out ), Settlement time , Invoice number ( When a settlement document corresponds to multiple invoices , As long as there is an invoice that can be associated , Just find out )
Sort according to the settlement time

Of course , The actual demand at that time , There are more list fields , There are more query conditions than this ......

3. Insert data into a table

First, give the bill of lading ( purchase_order ) And purchase statement ( settlement_voucher ) Insert each 10 Ten thousand data , I use stored procedures here ：

begin
declare i int;		
declare purchase_weight decimal(10,2);
declare unit_price decimal(10,2);
declare purchase_bill_no varchar(255);
declare settlement_bill_no varchar(255);
set i=0;
while i<100000 do		
	select ROUND(RAND()*100,2) into purchase_weight from dual;
	select ROUND(RAND()*10,2) into unit_price from dual;
	select CONCAT('purchase-',LPAD(i,8,'0')) into purchase_bill_no from dual;
	select CONCAT('settlement-',LPAD(i,8,'0')) into settlement_bill_no from dual;

	--  Insert the purchase order table , The purchase time is randomly generated 
	insert into purchase_order(purchase_time,purchase_pre_unit_price,purchase_weight,purchase_bill_no) 
	select (DATE_ADD(NOW(),  INTERVAL  FLOOR(1 - (RAND() * 864000))   SECOND )),
		unit_price,purchase_weight,purchase_bill_no from dual;
	--  Insert statement , The settlement time is randomly generated 
	insert into settlement_voucher(purchase_bill_no,settlement_bill_no,unit_price,settlement_weight,cut_off_time)
	select purchase_bill_no,settlement_bill_no,unit_price,purchase_weight,
	(DATE_ADD(NOW(),  INTERVAL  FLOOR(1 - (RAND() * 864000))   SECOND )) from dual;
set i=i+1;		
end while;
end

Call stored procedure to generate data ：

call pre();

After generation, you need to change several pieces of data randomly , Simulate that one purchase order can correspond to multiple purchase settlement documents , And one purchase settlement document can correspond to multiple purchase orders （ In this way, the data is more realistic ）.

The situation that one purchase order can correspond to multiple purchase settlement documents is not simulated , In fact, this situation has little impact on this query .

One purchase settlement document can correspond to multiple purchase orders ：

Create some invoice data and settlement documents - Invoice related data , Need to reflect the many to many relationship ：

insert into invoice(invoice_code,invoice_number,pay_amount)
VALUES
('111111','1111100','1000'),
('111112','1111101','1001'),
('111113','1111102','1002'),
('111114','1111103','1003'),
('111115','1111104','1004'),
('111116','1111105','1005'),
('111117','1111106','1006'),
('111118','1111107','1007'),
('111119','1111108','1008'),
('111110','1111109','1009');

INSERT into settlement_invoice_relation(invoice_code,invoice_number,settlement_bill_no)
VALUES
('111111','1111100','settlement-00000000'),
('111112','1111101','settlement-00000000'),
('111113','1111102','settlement-00000000'),
('111114','1111103','settlement-00000004'),
('111114','1111103','settlement-00000006'),
('111114','1111103','settlement-00000030'),
('111116','1111105','settlement-00000041'),
('111117','1111106','settlement-00000041'),
('111118','1111107','settlement-00000043');

4. Start writing according to your needs SQL

The first step of optimization , Of course, I want the product manager to remove some query conditions , Avoid associating the purchase order table with the purchase settlement table , But you know ......

Here, take the purchase time as an example （ Because it is mainly the association between the purchase order and the purchase settlement order that leads to slow query ）, Remember the demand , One purchase settlement may correspond to multiple purchase orders , As long as one of the purchase orders is within the time range , You need to find out this purchase settlement

also ： The index in the table I created above also simulates the index before optimization at that time ......

4.1 The first edition

select 
	GROUP_CONCAT(po.purchase_time) as  Purchase time ,
	AVG(IFNULL(po.purchase_pre_unit_price,0)) as  Average purchase price ,
	t.settlement_bill_no as  Settlement doc No ,
	AVG(IFNULL(t.unit_price,0)) as  Average settlement price ,
	any_value(t.cut_off_time) as  Settlement time ,
	any_value(invoice_tmp.invoice_code) as  Invoice code ,
	any_value(invoice_tmp.invoice_number) as  Invoice number 
from settlement_voucher t
left join purchase_order po on FIND_IN_SET(po.purchase_bill_no,t.purchase_bill_no)>0
left join (
	select sir.settlement_bill_no,
				 GROUP_CONCAT(i.invoice_number) invoice_number,
				 GROUP_CONCAT(i.invoice_code) invoice_code 
	from settlement_invoice_relation sir, invoice i 
	where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number 
	group by sir.settlement_bill_no
) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_no
where 1=1 
-- and t.settlement_bill_no='settlement-00000000'
and EXISTS(select 1 from purchase_order po1 where FIND_IN_SET(po1.purchase_bill_no,t.purchase_bill_no)>0
	and po1.purchase_time >='2022-07-01 00:00:00' 
)
and EXISTS(select 1 from purchase_order po1 where FIND_IN_SET(po1.purchase_bill_no,t.purchase_bill_no)>0
	and po1.purchase_time <='2022-07-23 23:59:59' 
)
group by t.settlement_bill_no;

The first edition SQL At that time, it was implemented in the local environment with 5 About seconds , At this time, I have realized the problem , Let alone production , I have to hang up in the test environment .

But look at my own garbage server （ Dual core 4G） Run on this SQL Well , It can't be implemented at all （ Although the company server is better , But the production environment is really stuck ）：

I haven't seen the implementation plan at that time , In a glance , This SQL Used in the FIND_IN_SET , I'm sure I won't go to the index , It's useless to build an index , That is, mainly the purchase order table ( purchase_order ) And purchase statement settlement_voucher Correlation will be slow , After all, they have a many to many relationship , Plus this disgusting need . So now think about how to avoid FIND_IN_SET .

Yes , During the meal , A whim ： I should be able to split the purchase statement into a temporary table , If the purchase settlement sheet corresponds to 5 A purchase order , I'll split the purchase statement into 5 Data , These five data are different except for the purchase order number , Other fields equally , So that you don't have to FIND_IN_SET 了 .

Do as you say , So there is the second edition below SQL.

4.2 The second edition

Split the purchase statement into the above-mentioned temporary table , You need to add a table ：

CREATE TABLE `incre_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' Used to split purchase settlement sheet ';
--  Be careful ： How many purchase orders does a purchase settlement correspond to , Here is how many pieces of data to insert in turn , I'm here 10 strip   That's enough. 
insert into incre_table(id) VALUES(1);
insert into incre_table(id) VALUES(2);
insert into incre_table(id) VALUES(3);
insert into incre_table(id) VALUES(4);
insert into incre_table(id) VALUES(5);
insert into incre_table(id) VALUES(6);
insert into incre_table(id) VALUES(7);
insert into incre_table(id) VALUES(8);
insert into incre_table(id) VALUES(9);
insert into incre_table(id) VALUES(10);

First, let's see how to split a purchase settlement into multiple pieces ：

select 
	sv.cut_off_time,
	sv.settlement_bill_no,
	sv.unit_price,
	sv.settlement_weight,
	SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_no
from settlement_voucher sv
RIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)
where sv.settlement_bill_no='settlement-00000000';

To explain this operation ：

First of all, I created a system with only id Table of incre_table , Ten pieces of data are inserted , And these ten data must be 1-10.
And then I use settlement_voucher Right connected incre_table , And only take incre_table in id Data less than or equal to the quantity of the purchase order . This will control this one SQL How many pieces of data should be queried （ It just happens to be the number of purchase orders corresponding to a purchase settlement ）.
And then use SUBSTRING_INDEX To split one by one settlement_voucher The purchase order number in the table

This set of SQL The result of execution is ：

combined , I finished the Second Edition SQL：

select 
	GROUP_CONCAT(po.purchase_time) as  Purchase time ,
	AVG(IFNULL(po.purchase_pre_unit_price,0)) as  Average purchase price ,
	t.settlement_bill_no as  Settlement doc No ,
	AVG(IFNULL(t.unit_price,0)) as  Average settlement price ,
	any_value(t.cut_off_time) as  Settlement time ,
	any_value(invoice_tmp.invoice_code) as  Invoice code ,
	any_value(invoice_tmp.invoice_number) as  Invoice number 
from (
	select 
		sv.cut_off_time,
		sv.settlement_bill_no,
		sv.unit_price,
		sv.settlement_weight,
		SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_no
	from settlement_voucher sv
	RIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)
) t
left join purchase_order po on po.purchase_bill_no = t.purchase_bill_no
left join (
	select sir.settlement_bill_no,
				 GROUP_CONCAT(i.invoice_number) invoice_number,
				 GROUP_CONCAT(i.invoice_code) invoice_code 
	from settlement_invoice_relation sir, invoice i 
	where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number 
	group by sir.settlement_bill_no
) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_no
where 1=1 
 -- and t.settlement_bill_no='settlement-00000000'
and po.purchase_time >='2022-07-01 00:00:00' 
and po.purchase_time <='2022-07-23 23:59:59' 
group by t.settlement_bill_no;

There must be no problem with the test query data results ！！！

well , Here we finally use everything FIND_IN_SET The place of , At this time, it is meaningful to look at the index ！

Look at the implementation plan ：

Asiba , A bunch of full table scans , Look at the second edition above SQL, Find the purchase list ( purchase_order ) Of purchase_bill_no Fields should be indexed , In principle, this field should have been quoted when designing tables , But I thought it was just that I thought , It's really not indexed , well , Then add an index to it ：

create index idx_purchase_bill_no on purchase_order(purchase_bill_no);

After adding this index , Let's look at the implementation plan ：

purchase_order Tabular purchase_bill_no Already gone , however settlement_invoice_relation Why not go to the index , It has two indexes ......

Take another look at the execution on my garbage server , See if it can be implemented ：

Okay , In order to make settlement_invoice_relation The query of the table also goes through the index , Start the next round of SQL Optimize

4.3 The third edition

Don't aggregate and obtain below invoice_code and invoice_number 了 , Aggregate on it , As for using these two fields as query criteria , Then you can put this one below SQL Another layer of , Query again as a temporary table , I'm not going to do that here

select 
	GROUP_CONCAT(po.purchase_time) as  Purchase time ,
	AVG(IFNULL(po.purchase_pre_unit_price,0)) as  Average purchase price ,
	t.settlement_bill_no as  Settlement doc No ,
	AVG(IFNULL(t.unit_price,0)) as  Average settlement price ,
	any_value(t.cut_off_time) as  Settlement time ,
	GROUP_CONCAT(DISTINCT invoice_tmp.invoice_code) as  Invoice code ,
	GROUP_CONCAT(DISTINCT invoice_tmp.invoice_number) as  Invoice number 
from (
	select 
		sv.cut_off_time,
		sv.settlement_bill_no,
		sv.unit_price,
		sv.settlement_weight,
		SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_no
	from settlement_voucher sv
	RIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)
) t
left join purchase_order po on po.purchase_bill_no = t.purchase_bill_no
left join (
	select sir.settlement_bill_no,
				 i.invoice_number,
				 i.invoice_code 
	from settlement_invoice_relation sir, invoice i 
	where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number 
) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_no
where 1=1 
 -- and t.settlement_bill_no='settlement-00000000'
and po.purchase_time >='2022-07-01 00:00:00' 
and po.purchase_time <='2022-07-23 23:59:59' 
group by t.settlement_bill_no;

Let's look at the implementation plan ：

At this time , Basic optimization is over , Look at the results on my spam server ：

Come here , Basically, the production can be found in less than three seconds , This time SQL Optimization is over ！！！

however , In fact, we can continue to optimize , But there are many changes in the design to the system , The impact is relatively large , Let's talk about the train of thought , Temporarily unable to practice ：

You can list the purchase order purchase_order And purchase statement settlement_voucher Between , Create a middle table , Implement many to many relationships , Then index , It should be faster , And it can be done once and for all , In the future, this connection will be more convenient ！

原网站

版权声明
本文为[JavaShark]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207260303163869.html