当前位置：网站首页>第42期：MySQL 是否有必要多列分区

第42期：MySQL 是否有必要多列分区

2022-06-29 17:03:00 【爱可生开源社区】

之前的篇章我们讨论的都是基于单列的分区表，那有无必要建立基于多列的分区表？这种分区表数据分布是否均匀？有无特殊的应用场景？有无特殊的优化策略？本篇基于这些问题来进行重点解读。

MySQL 不仅支持基于单列分区，也支持基于多列分区。比如基于字段（f1,f2,f3)来建立分区表，使用方法和使用场景都有些类似于联合索引。比如下面查询语句，同时对列(f1,f2,f3) 进行过滤。

select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;

多列分区表的前提是参与分区的列检索频率均等，如果不均等，就没有必要使用多列分区。

我们还是以具体实例来验证下多列分区的优缺点以及适用场景，这样理解起来更加透彻。

建立一张表p1，字段r1,r2,r3分别取值为1-8，1-5，1-5.

create table p1(r1 int,r2 int,r3 int,log_date datetime);

按照字段(r1,r2,r3) 的分布范围，我来写个存储过程处理下表p1，变为分区表。存储过程代码如下：

DELIMITER $$

USE `ytt_new`$$

DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
	DECLARE i,j,k INT UNSIGNED DEFAULT 1;
	SET @stmt = '';
	SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
        WHILE i <= 8 DO
	   set j = 1;
	   while j <= 5 do
	     set k = 1;
	     while k <= 5 do
               SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
               set k = k + 1;
	     end while;
	     set j = j + 1;
	   end while;
	   SET i = i + 1;        
        END WHILE;	
	SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
        SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
        PREPARE s1 FROM @stmt;
        EXECUTE s1;
        DROP PREPARE s1;
        SET @stmt = NULL;
        SET @stmt_begin = NULL;
        SET @stmt_end = NULL;	
	END$$

DELIMITER ;

调用存储过程，变更表p1为多列分区表，此时表p1有201个分区，记录数为500W条。

mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)

mysql> select count(partition_name) as partition_count  from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
|             201 |
+-----------------+
1 row in set (0.00 sec)

mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (12.01 sec)

用同样的方法建立一张分区表p2，来对单列分区表与多列分区表在一些场景下的性能做下对比：

分区表p2按照字段r1分区，仅仅分了9个。

mysql> CREATE TABLE `p2` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE  COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
 PARTITION p2 VALUES LESS THAN (2) ,
 PARTITION p3 VALUES LESS THAN (3) ,
 PARTITION p4 VALUES LESS THAN (4) ,
 PARTITION p5 VALUES LESS THAN (5) ,
 PARTITION p6 VALUES LESS THAN (6) ,
 PARTITION p7 VALUES LESS THAN (7) ,
 PARTITION p8 VALUES LESS THAN (8) ,
 PARTITION p_max VALUES LESS THAN (MAXVALUE) 
)
1 row in set (0.00 sec)

mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000  Duplicates: 0  Warnings: 0

多个字段等值过滤的性能对比：同样的查询条件，表p1（执行时间0.02秒）比p2（执行时间0.49秒）要快几十倍。

mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.49 sec)

查看两者执行计划对比：同样的查询，表p1扫描行数只有2W多，而表p2扫描行数有62W行，相差巨大。

mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p223
         type: ALL
...
         rows: 24711
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

如果过滤字段不完整呢？比如不检索最后一列，再次做下对比：同样表p1（0.1秒）比表p2（0.52秒）执行时间要少几倍。

mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.10 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.52 sec)

那只检索第一列呢：这次表p1和p2执行时间上差不多，p2稍微占优势。

mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.56 sec)

mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.45 sec)

看下执行计划对比：表p1扫描的分区数为26个，表p2仅扫描1个分区，分区数量上表p2相对少很多。

mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
         type: ALL
...
         rows: 648074
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

如果把字段r1拿掉呢？执行时间也相差无几，表p1和表p2都会扫描所有分区。

mysql> select count(*) from p1 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.87 sec)

mysql> select count(*) from p2 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.75 sec)

那鉴于此，再来探讨一个问题：对于多列分区，字段的排列顺序是否重要？

关于这个顺序要和我们查询语句对应的过滤条件来一一说明。类似下面两类 SQL ：

SQL 1： select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;

对于SQL 1，顺序无关紧要，因为三个列在查询时都已包含；

SQL 2: select * from p1 where r1 = 2 and r2 = 2;

对于SQL 2 ， (r1,r2,r3) 和 (r2,r1,r3) 都可以满足。

SQL 3： select * from p1 where r2 = 2 and r3 = 2;

对于SQL 3， (r2,r3,r1) 和 (r3,r2,r1) 也都可以满足。

用同样的方法来建立分区表p3，分区字段顺序为(r2,r3,r1):

mysql> show create table p3\G
*************************** 1. row ***************************
       Table: p3
Create Table: CREATE TABLE `p3` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB 
/*!50500 PARTITION BY RANGE  COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...

对于表p3来讲：下面这条 SQL 执行时间比表p1要快几十倍，由于分区字段顺序不同，表p1要扫描所有分区才能出结果。

mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (0.22 sec)

mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (5.05 sec)

所以对于多列分区表，正如开头讲的一样，它和联合索引的使用方法、注意事项、使用场景也都很类似。对于某些特定的场景，使用多列分区能显著加快查询性能。

原网站

版权声明
本文为[爱可生开源社区]所创，转载请带上原文链接，感谢
https://cloud.tencent.com/developer/article/2034438

当前位置：网站首页>第42期：MySQL 是否有必要多列分区

第42期：MySQL 是否有必要多列分区

边栏推荐

猜你喜欢

随机推荐