当前位置：网站首页>Issue 42: is it necessary for MySQL to have multiple column partitions

Issue 42: is it necessary for MySQL to have multiple column partitions

2022-06-29 17:44:00 【ActionTech】

In the previous chapters, we discussed partitioned tables based on single column , Is it necessary to create a partitioned table based on multiple columns ？ Whether the partition table data is evenly distributed ？ Are there any special application scenarios ？ Is there any special optimization strategy ？ This article focuses on the interpretation based on these questions .

MySQL Not only supports single column partition , It also supports partitioning based on multiple columns . For example, field based （f1,f2,f3) To create partition tables , The usage method and usage scenario are somewhat similar to the joint index . For example, the following query statement , Simultaneous alignment of columns (f1,f2,f3) To filter .

select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;

The premise of a multi column partitioned table is that the columns participating in the partition have the same retrieval frequency , If it is not equal , There is no need to use multi column partitions .

Let's use specific examples to verify the advantages, disadvantages and applicable scenarios of multi column partitions , This is a more thorough understanding .

Create a table p1, Field r1,r2,r3 The values are respectively 1-8,1-5,1-5.

create table p1(r1 int,r2 int,r3 int,log_date datetime);

According to the field (r1,r2,r3) Distribution range of , Let me write a stored procedure to handle the following table p1, Become a partitioned table . The stored procedure code is as follows ：

DELIMITER $$

USE `ytt_new`$$

DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
	DECLARE i,j,k INT UNSIGNED DEFAULT 1;
	SET @stmt = '';
	SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
        WHILE i <= 8 DO
	   set j = 1;
	   while j <= 5 do
	     set k = 1;
	     while k <= 5 do
               SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
               set k = k + 1;
	     end while;
	     set j = j + 1;
	   end while;
	   SET i = i + 1;        
        END WHILE;	
	SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
        SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
        PREPARE s1 FROM @stmt;
        EXECUTE s1;
        DROP PREPARE s1;
        SET @stmt = NULL;
        SET @stmt_begin = NULL;
        SET @stmt_end = NULL;	
	END$$

DELIMITER ;

Calling stored procedure , Change form p1 Partition tables for multiple columns , At this point, the table p1 Yes 201 Zones , The record number is 500W strip .


mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)

mysql> select count(partition_name) as partition_count  from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
|             201 |
+-----------------+
1 row in set (0.00 sec)

mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (12.01 sec)

Create a partition table in the same way p2, To compare the performance of a single column partitioned table and a multi column partitioned table in some scenarios ：

Partition table p2 According to the field r1 Partition , Only divided 9 individual .

mysql> CREATE TABLE `p2` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE  COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
 PARTITION p2 VALUES LESS THAN (2) ,
 PARTITION p3 VALUES LESS THAN (3) ,
 PARTITION p4 VALUES LESS THAN (4) ,
 PARTITION p5 VALUES LESS THAN (5) ,
 PARTITION p6 VALUES LESS THAN (6) ,
 PARTITION p7 VALUES LESS THAN (7) ,
 PARTITION p8 VALUES LESS THAN (8) ,
 PARTITION p_max VALUES LESS THAN (MAXVALUE) 
)
1 row in set (0.00 sec)

mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000  Duplicates: 0  Warnings: 0

Performance comparison of equivalent filtering of multiple fields ： The same query condition , surface p1（ execution time 0.02 second ） Than p2（ execution time 0.49 second ） Dozens of times faster .

mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.49 sec)

View the comparison between the two execution plans ： Same query , surface p1 The number of scan lines is only 2W many , And tables p2 The number of scanning lines is 62W That's ok , There's a huge difference .

mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p223
         type: ALL
...
         rows: 24711
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

What if the filter fields are incomplete ？ For example, do not retrieve the last column , Make a comparison again ： The same table p1（0.1 second ） Comparison table p2（0.52 second ） Several times less execution time .

mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.10 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.52 sec)

The first column is only searched ： This time p1 and p2 The execution time is about the same ,p2 Slightly dominant .

mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.56 sec)

mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.45 sec)

Take a look at the execution plan comparison ： surface p1 The number of partitions scanned is 26 individual , surface p2 Scan only 1 Zones , The number of partitions is shown in the table above p2 A lot less .

mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
         type: ALL
...
         rows: 648074
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

If the field r1 Take it off ？ The execution time is almost the same , surface p1 And table p2 Will scan all partitions .

mysql> select count(*) from p1 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.87 sec)

mysql> select count(*) from p2 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.75 sec)

In view of this , Let's discuss another question ： For multi column partitions , Whether the order of the fields is important ？

This order should be explained one by one with the filter conditions corresponding to our query statements . Similar to the following two categories SQL ：

SQL 1： select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;

about SQL 1, Order doesn't matter , Because all three columns have been included in the query ;

SQL 2: select * from p1 where r1 = 2 and r2 = 2;

about SQL 2 , (r1,r2,r3) and (r2,r1,r3) Can satisfy .

SQL 3： select * from p1 where r2 = 2 and r3 = 2;

about SQL 3, (r2,r3,r1) and (r3,r2,r1) Also can satisfy .

Create partition tables in the same way p3, The partition field order is (r2,r3,r1):

mysql> show create table p3\G
*************************** 1. row ***************************
       Table: p3
Create Table: CREATE TABLE `p3` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB 
/*!50500 PARTITION BY RANGE  COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...

For tables p3 Speaking of ： The next one SQL Execution time ratio table p1 Dozens of times faster , Due to the different order of partition fields , surface p1 You need to scan all partitions to get results .

mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (0.22 sec)

mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (5.05 sec)

So for a multi column partitioned table , As we said at the beginning , It and how to use the union index 、 matters needing attention 、 The usage scenarios are similar . For certain scenarios , Using multi column partitioning can significantly improve query performance .

原网站

版权声明
本文为[ActionTech]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206291738596044.html

当前位置：网站首页>Issue 42: is it necessary for MySQL to have multiple column partitions

Issue 42: is it necessary for MySQL to have multiple column partitions

边栏推荐

猜你喜欢

随机推荐