当前位置：网站首页>A small case with 666 times performance improvement illustrates the importance of using indexes correctly in tidb

A small case with 666 times performance improvement illustrates the importance of using indexes correctly in tidb

2022-06-12 18:53:00 【HOHO】

background

I've been working on a logistics system recently TiDB POC test , This system is based on MySQL Developed , The business data put into test this time is about 10 Kuyo 900 A watch , The biggest single watch 6 More than ten million lines .

This scale is not big , The structure of test data and library table is Dumpling from MySQL export , Reuse Lightning Import to TiDB in , The whole process went very well .

The system is in TiDB After running on the , adopt Dashboard One was observed SQL Appear very regularly in slow query pages , open SQL At first glance, it's just a single table query, which is not complicated , There must be something strange .

Problem phenomenon

Here is from Dashboard The original caught in SQL And execution plan , All in all 1.2s, Most of this time is spent on Coprocessor Scan data ：

SELECT {31 A field }
FROM
  job_cm_data
WHERE
  (
    group_id = 'GROUP_MATERIAL'
    AND cur_thread = 1
    AND pre_excutetime < '2022-04-27 11:55:00.018'
    AND ynflag = 1
    AND flag = 0
  )
ORDER BY
  id
LIMIT
  200;

	id                         	task     	estRows	operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                	actRows	execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               	memory 	disk
	Projection_7               	root     	200    	test_ba.job_cm_data.id, test_ba.job_cm_data.common_job_type, test_ba.job_cm_data.org_code, test_ba.job_cm_data.key_one, test_ba.job_cm_data.key_two, test_ba.job_cm_data.key_three, test_ba.job_cm_data.key_four, test_ba.job_cm_data.key_five, test_ba.job_cm_data.key_six, test_ba.job_cm_data.key_seven, test_ba.job_cm_data.key_eight, test_ba.job_cm_data.permission_one, test_ba.job_cm_data.permission_two, test_ba.job_cm_data.permission_three, test_ba.job_cm_data.cur_thread, test_ba.job_cm_data.group_id, test_ba.job_cm_data.max_execute_count, test_ba.job_cm_data.remain_execute_count, test_ba.job_cm_data.total_execute_count, test_ba.job_cm_data.pre_excutetime, test_ba.job_cm_data.related_data, test_ba.job_cm_data.delay_time, test_ba.job_cm_data.error_message, test_ba.job_cm_data.flag, test_ba.job_cm_data.ynflag, test_ba.job_cm_data.create_time, test_ba.job_cm_data.update_time, test_ba.job_cm_data.create_user, test_ba.job_cm_data.update_user, test_ba.job_cm_data.ip, test_ba.job_cm_data.version_num	0      	time:1.17s, loops:1, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         	83.8 KB	N/A
	└─Limit_14                 	root     	200    	offset:0, count:200                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          	0      	time:1.17s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          	N/A    	N/A
	  └─Selection_31           	root     	200    	eq(test_ba.job_cm_data.ynflag, 1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	0      	time:1.17s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          	16.3 KB	N/A
	    └─IndexLookUp_41       	root     	200    	                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             	0      	time:1.17s, loops:1, index_task: {total_time: 864.6ms, fetch_handle: 26.1ms, build: 53.3ms, wait: 785.2ms}, table_task: {total_time: 4.88s, num: 17, concurrency: 5}                                                                                                                                                                                                                                                                                                                                                                                         	4.06 MB	N/A
	      ├─IndexRangeScan_38  	cop[tikv]	7577.15	table:job_cm_data, index:idx_group_id(group_id), range:["GROUP_MATERIAL","GROUP_MATERIAL"], keep order:true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  	258733 	time:3.34ms, loops:255, cop_task: {num: 1, max: 2.45ms, proc_keys: 0, rpc_num: 1, rpc_time: 2.43ms, copr_cache_hit_ratio: 1.00}, tikv_task:{time:146ms, loops:257}                                                                                                                                                                                                                                                                                                                                                                                           	N/A    	N/A
	      └─Selection_40       	cop[tikv]	200    	eq(test_ba.job_cm_data.cur_thread, 1), eq(test_ba.job_cm_data.flag, 0), lt(test_ba.job_cm_data.pre_excutetime, 2022-04-27 11:55:00.018000)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         	0      	time:4.68s, loops:17, cop_task: {num: 18, max: 411.4ms, min: 15.1ms, avg: 263ms, p95: 411.4ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 4.41s, tot_wait: 6ms, rpc_num: 18, rpc_time: 4.73s, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:382ms, min:12ms, p80:376ms, p95:382ms, iters:341, tasks:18}, scan_detail: {total_process_keys: 258733, total_process_keys_size: 100627600, total_keys: 517466, rocksdb: {delete_skipped_count: 0, key_skipped_count: 258733, block: {cache_hit_count: 1296941, read_count: 0, read_byte: 0 Bytes}}}	N/A    	N/A
	        └─TableRowIDScan_39	cop[tikv]	7577.15	table:job_cm_data, keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          	258733 	tikv_task:{proc max:381ms, min:12ms, p80:375ms, p95:381ms, iters:341, tasks:18}                                                                                                                                                                                                                                                                                                                                                                                                                                                                              	N/A    	N/A

The implementation plan is relatively simple , A little analysis shows its execution process ：

First use IndexRangeScan Operator scan idx_group_id This index , Got it 258733 Line eligible rowid
Then take rowid To do TableRowIDScan Scan each row of data and filter it , Got it 0 Row data
The above two steps form a IndexLookUp Back to table operation , Return the result to TiDB Nodes do Limit, obtain 0 Row data
Finally, make a field projection Projection Get the final result

from execution info I see that the main time is spent in Selection_40 This step , It is preliminarily judged that a large number of back tables lead to performance problems .

Tips ： notice IndexRangeScan in Loops It is particularly important to pay attention to .

In depth analysis

Infer from experience , Many back tables show that the index effect is not good , Let's first look at the total number of rows in this table ：

mysql> select count(1) from job_cm_data;
+----------+
| count(1) |
+----------+
|   311994 |
+----------+
1 row in set (0.05 sec)

From the number of back tables , The discrimination of this index field is certainly not good , Further verify this inference ：

mysql> select group_id,count(1) from job_cm_data group by group_id;
+------------------------------+----------+
| group_id                     | count(1) |
+------------------------------+----------+
| GROUP_HOUSELINK              |       20 |
| GROUP_LMSMATER               |    37667 |
| GROUP_MATERIAL               |   258733 |
| GROUP_MATERISYNC             |    15555 |
| GROUP_WAREHOUSE_CONTRACT     |        7 |
| GROUP_WAREHOUSE_CONTRACT_ADD |       12 |
+------------------------------+----------+
6 rows in set (0.01 sec)

From the above two results, we can judge idx_group_id This index has the following problems ：

Very poor discrimination , Only 6 Different values
The data distribution is very uneven ,GROUP_MATERIAL This value accounts for more than 80%

So this is a very failed index .

For the SQL for , First, scan out from the index 258733 individual rowid, Take this again 258733 individual rowid Check the raw data , Not only does it not improve query efficiency , Instead, it makes the query slower .

Don't believe it , Let's delete this index and run it again SQL.

mysql> alter table job_cm_data drop index idx_group_id;
Query OK, 0 rows affected (0.52 sec)

From this execution plan, we can see that it has now become a full table scan , But the execution time is more than twice as short as before , And when you hit Coprocessor Cache The speed will be faster when ：

Just when I thought deleting the index would be all right , In the surveillance Duration 99 The line suddenly rose to 200 many ms, With a question mark on your face, quickly check what's going on in the slow log . I found this SQL Although the execution time is shorter , But the slow SQL Suddenly more ：

Careful contrast SQL After the discovery , these SQL Yes, they inquired separately group_id Of 6 It's worth , And the frequency is high . In other words, except for the one posted above SQL Quickens , other group_id Your queries are slowing down .

In fact, this is also expected ,group_id Even if less data is indexed, the number of times it returns to the table is very small , This time is still much faster than full table scanning .

Therefore, to solve this problem, just deleting the index is not enough , Not only are there more slow queries duration Increase , The consequences of full table scanning lead to TiKV The read request pressure of the node is particularly high .

Initially, this table has only 2 individual region, and leader All in the same store On , Cause this node CPU Usage soared , Reading hot issues are very obvious .

After manual segmentation region Then allocate the request to 3 individual TiKV In nodes , but Unified Readpool CPU Still reached 80% about , The maximum flow per minute of the thermodynamic diagram 6G.

Keep turning it .

Solutions

Since full table scanning doesn't work , The solution is to find a way to use the index .

After communicating with the business side , We know that this is a table for storing metadata of scheduled tasks , Although queries are frequent, very few result sets are returned each time , There are not so many tasks to deal with in real business .

Based on this background , I think we can find out the qualified by looking up the index rowid, Taking this small result set back to the table can greatly improve the performance .

So obviously , We need a composite index , Also known as a federated index 、 Composite index , That is, put multiple fields in one index . For the case in this article , Think about it where Query fields form a composite index .

But how to combine fields is actually very particular , Many people may have a brain 5 Create an index based on two conditions ：

ALTER TABLE `test`.`job_cm_data` 
ADD INDEX `idx_muti`(`group_id`, `cur_thread`,`pre_excutetime`,`ynflag`,`flag`);

exactly , From this execution plan, we can see that the performance has been greatly improved , Faster than full table scanning 10 times . Is it possible to finish work ？ Not yet .

There are two problems with this index ：

5 There are a little too many index fields , High maintenance cost
5 More than 10000 index scanning results are also a little too much （ Because only 3 A field ）

Based on the table statistics and index creation principles posted above , The discrimination of index fields must be high , this 5 In a query field pre_excutetime Yes 35068 Different values are more suitable for indexing ,group_id It has been ruled out from the beginning ,cur_thread Yes 6 There are different values, and the number of each value is uniform and not suitable for ,ynflag All data in the column is 1 You can just give up , In the end flag Need a special look .

mysql> select flag,count(1) from job_cm_data group by flag;
+------+----------+
| flag | count(1) |
+------+----------+
|    2 |   277832 |
|    4 |       30 |
|    1 |    34132 |
+------+----------+
3 rows in set (0.06 sec)

From the output above , It's not a good index field , But coincidentally, the actual business is query flag=0 The data of , That is, if you index it , You can exclude... From the index 99% The above data . I little interesting , Then try building an index .

ALTER TABLE `test`.`job_cm_data` 
ADD INDEX `idx_muti`(`pre_excutetime`,`flag`);

The result seems not quite right as expected , How to scan 31 Ten thousand line index ？

Don't forget , Composite index has a leftmost matching principle , And this pre_excutetime Just a range query , So actually only pre_excutetime This index , However, the data of the whole table conforms to the filtering time period , It's the same thing as IndexFullScan 了 . That line , Then change the field order ：

ALTER TABLE `test`.`job_cm_data` 
ADD INDEX `idx_muti`(`flag``pre_excutetime`);

Seeing the execution time, I'm satisfied , Before use Coprocessor Cache In this case, the execution time only needs 1.8ms. A small index adjustment , Performance improvement 666 times .

In fact, there is another principle for building a composite index , That is, the fields with high discrimination should be put in front . Because the composite index is compared from left to right , If the field with high discrimination area is placed in the front, the comparison range of the following fields can be greatly reduced , So as to maximize the efficiency of the index .

This is equivalent to layers of filters , We all hope that each layer can filter out invalid data as much as possible , I don't want to 10 When Wanxing came in, it was still on the last floor 10 Line ten thousand , Then the previous filtering is meaningless . In this case ,flag Is the strongest filter , It's perfect to put it in the front .

But it also depends on the actual scene , When the query flag The value of the 0 when , Will cause a certain amount of back table , We use 4（30 That's ok ） and 1（34132 That's ok ） Make a comparison ：

In real business ,flag=0 The data will not exceed 50 That's ok , Refer to the results above ,50 The second time I go back to my watch 10ms within , The performance is still good , It's all right . I think the application level allows adjustment SQL Words , With a little more restraint pre_excutetime Minimum time , It can be regarded as the best solution .

Finally, let's take a look at the comparison before and after optimization .

nice~

summary

This example is to remind you , Index is a good thing, but it's not a silver bullet , If you don't add well, it will inevitably backfire .

The index knowledge points involved in this paper ：

The partition of the index field should be high enough , The best example is the unique index
The efficiency of using index query is not necessarily faster than full table scanning
Make full use of the index characteristics to reduce the number of table returns
The leftmost matching principle of Composite Index
The fields with high discrimination in the composite index are placed in the front

When you encounter problems, you should be able to analyze specific situations , The use principle of index is estimated that many people have memorized , It still needs more thinking how to use it .

The index is not standardized ,DBA Two lines of tears , Cherish everyone around you and help you adjust SQL Of DBA Well .

原网站

版权声明
本文为[HOHO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/163/202206121826198421.html