当前位置:网站首页>第42期:MySQL 是否有必要多列分区
第42期:MySQL 是否有必要多列分区
2022-06-29 17:39:00 【ActionTech】

之前的篇章我们讨论的都是基于单列的分区表,那有无必要建立基于多列的分区表?这种分区表数据分布是否均匀?有无特殊的应用场景?有无特殊的优化策略?本篇基于这些问题来进行重点解读。
MySQL 不仅支持基于单列分区,也支持基于多列分区。比如基于字段(f1,f2,f3)来建立分区表,使用方法和使用场景都有些类似于联合索引。比如下面查询语句,同时对列(f1,f2,f3) 进行过滤。
select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;
多列分区表的前提是参与分区的列检索频率均等,如果不均等,就没有必要使用多列分区。
我们还是以具体实例来验证下多列分区的优缺点以及适用场景,这样理解起来更加透彻。
建立一张表p1,字段r1,r2,r3分别取值为1-8,1-5,1-5.
create table p1(r1 int,r2 int,r3 int,log_date datetime);
按照字段(r1,r2,r3) 的分布范围,我来写个存储过程处理下表p1,变为分区表。存储过程代码如下:
DELIMITER $$
USE `ytt_new`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
DECLARE i,j,k INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
WHILE i <= 8 DO
set j = 1;
while j <= 5 do
set k = 1;
while k <= 5 do
SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
set k = k + 1;
end while;
set j = j + 1;
end while;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;
调用存储过程,变更表p1为多列分区表,此时表p1有201个分区,记录数为500W条。
mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)
mysql> select count(partition_name) as partition_count from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
| 201 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (12.01 sec)
用同样的方法建立一张分区表p2,来对单列分区表与多列分区表在一些场景下的性能做下对比:
分区表p2按照字段r1分区,仅仅分了9个。
mysql> CREATE TABLE `p2` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
PARTITION p2 VALUES LESS THAN (2) ,
PARTITION p3 VALUES LESS THAN (3) ,
PARTITION p4 VALUES LESS THAN (4) ,
PARTITION p5 VALUES LESS THAN (5) ,
PARTITION p6 VALUES LESS THAN (6) ,
PARTITION p7 VALUES LESS THAN (7) ,
PARTITION p8 VALUES LESS THAN (8) ,
PARTITION p_max VALUES LESS THAN (MAXVALUE)
)
1 row in set (0.00 sec)
mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000 Duplicates: 0 Warnings: 0
多个字段等值过滤的性能对比:同样的查询条件,表p1(执行时间0.02秒)比p2(执行时间0.49秒)要快几十倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.49 sec)
查看两者执行计划对比: 同样的查询,表p1扫描行数只有2W多,而表p2扫描行数有62W行,相差巨大。
mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p223
type: ALL
...
rows: 24711
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果过滤字段不完整呢?比如不检索最后一列,再次做下对比:同样表p1(0.1秒)比表p2(0.52秒)执行时间要少几倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.10 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.52 sec)
那只检索第一列呢:这次表p1和p2执行时间上差不多,p2稍微占优势。
mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.56 sec)
mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.45 sec)
看下执行计划对比:表p1扫描的分区数为26个,表p2仅扫描1个分区,分区数量上表p2相对少很多。
mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
type: ALL
...
rows: 648074
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果把字段r1拿掉呢?执行时间也相差无几,表p1和表p2都会扫描所有分区。
mysql> select count(*) from p1 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.87 sec)
mysql> select count(*) from p2 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.75 sec)
那鉴于此,再来探讨一个问题:对于多列分区,字段的排列顺序是否重要?
关于这个顺序要和我们查询语句对应的过滤条件来一一说明。 类似下面两类 SQL :
SQL 1: select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;
对于SQL 1,顺序无关紧要,因为三个列在查询时都已包含;
SQL 2: select * from p1 where r1 = 2 and r2 = 2;
对于SQL 2 , (r1,r2,r3) 和 (r2,r1,r3) 都可以满足。
SQL 3: select * from p1 where r2 = 2 and r3 = 2;
对于SQL 3, (r2,r3,r1) 和 (r3,r2,r1) 也都可以满足。
用同样的方法来建立分区表p3,分区字段顺序为(r2,r3,r1):
mysql> show create table p3\G
*************************** 1. row ***************************
Table: p3
Create Table: CREATE TABLE `p3` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
/*!50500 PARTITION BY RANGE COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...
对于表p3来讲:下面这条 SQL 执行时间比表p1要快几十倍,由于分区字段顺序不同,表p1要扫描所有分区才能出结果。
mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (0.22 sec)
mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (5.05 sec)
所以对于多列分区表,正如开头讲的一样,它和联合索引的使用方法、注意事项、使用场景也都很类似。对于某些特定的场景,使用多列分区能显著加快查询性能。

边栏推荐
- R language uses user-defined functions to write deep learning linear activation functions and visualize linear activation functions
- 关于日期相加减问题
- 开源仓库贡献 —— 提交 PR
- Digital twin energy system, creating a "perspective" in the low-carbon era
- Pancakeswap Technology: development principle of gripper robot system
- mysql查询视图命令是哪个
- Bags of Binary Words for Fast Place Recognition in Image Sequenc
- KUKA robot external axis configuration what you must know
- Error:Connection refused: connect
- The fixed assets management system enables enterprises to dynamically master assets
猜你喜欢

Face recognition 4- research on Baidu commercial solutions

剖析下零拷贝机制的实现原理,适用场景和代码实现

Yurun multidimensional makes efforts in the charity field and bravely resists the corporate public welfare banner

如何使用B/S开发工具DevExtreme的图表控件 - 自定义轴位置?

Openfeign use step polling strategy and weight log4j configuration of openfeign interceptor

How to solve the 2003 error of MySQL in Linux

基于gis三维可视化的智慧城市行业运用

Automatic vending machine
![Split palindrome string [dp + DFS combination]](/img/7b/221b000984977508f849e19802c2c2.png)
Split palindrome string [dp + DFS combination]

从一个被应用商店坑了的BUG说起
随机推荐
What is a SCM system? What are the advantages of a supply chain management system?
It is the same that robots can win gold medals without maintenance and out of bounds
迈动互联中标大家保险集团
The soft youth under the blessing of devcloud makes education "smart" in the cloud
Mysql中锁的使用场景是什么
分割回文串[dp + dfs组合]
OpenFeign使用步骤 轮询策略与权重 log4j使用 openFeign拦截器的配置
mysql游标的作用是什么
Redis 原理 - Sorted Set (ZSet)
PCB frame drawing - ad19
软件快速交付真的需要以安全为代价吗?
Graduation season | Huawei experts teach interview tips: how to get a high salary offer from a large factory?
SRM supplier collaborative management system function introduction
R language uses user-defined functions to write deep learning leaky relu activation functions and visualize leaky relu activation functions
Redux源码分析之createStore
MySQL触发器如何创建与删除
LeetCode 每日一题——535. TinyURL 的加密与解密
如何使用B/S开发工具DevExtreme的图表控件 - 自定义轴位置?
基于STM32F103ZET6库函数定时器中断实验
SRM系统是什么系统?如何应用SRM系统?