当前位置:网站首页>第42期:MySQL 是否有必要多列分区
第42期:MySQL 是否有必要多列分区
2022-06-29 17:39:00 【ActionTech】

之前的篇章我们讨论的都是基于单列的分区表,那有无必要建立基于多列的分区表?这种分区表数据分布是否均匀?有无特殊的应用场景?有无特殊的优化策略?本篇基于这些问题来进行重点解读。
MySQL 不仅支持基于单列分区,也支持基于多列分区。比如基于字段(f1,f2,f3)来建立分区表,使用方法和使用场景都有些类似于联合索引。比如下面查询语句,同时对列(f1,f2,f3) 进行过滤。
select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;
多列分区表的前提是参与分区的列检索频率均等,如果不均等,就没有必要使用多列分区。
我们还是以具体实例来验证下多列分区的优缺点以及适用场景,这样理解起来更加透彻。
建立一张表p1,字段r1,r2,r3分别取值为1-8,1-5,1-5.
create table p1(r1 int,r2 int,r3 int,log_date datetime);
按照字段(r1,r2,r3) 的分布范围,我来写个存储过程处理下表p1,变为分区表。存储过程代码如下:
DELIMITER $$
USE `ytt_new`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
DECLARE i,j,k INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
WHILE i <= 8 DO
set j = 1;
while j <= 5 do
set k = 1;
while k <= 5 do
SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
set k = k + 1;
end while;
set j = j + 1;
end while;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;
调用存储过程,变更表p1为多列分区表,此时表p1有201个分区,记录数为500W条。
mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)
mysql> select count(partition_name) as partition_count from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
| 201 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (12.01 sec)
用同样的方法建立一张分区表p2,来对单列分区表与多列分区表在一些场景下的性能做下对比:
分区表p2按照字段r1分区,仅仅分了9个。
mysql> CREATE TABLE `p2` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
PARTITION p2 VALUES LESS THAN (2) ,
PARTITION p3 VALUES LESS THAN (3) ,
PARTITION p4 VALUES LESS THAN (4) ,
PARTITION p5 VALUES LESS THAN (5) ,
PARTITION p6 VALUES LESS THAN (6) ,
PARTITION p7 VALUES LESS THAN (7) ,
PARTITION p8 VALUES LESS THAN (8) ,
PARTITION p_max VALUES LESS THAN (MAXVALUE)
)
1 row in set (0.00 sec)
mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000 Duplicates: 0 Warnings: 0
多个字段等值过滤的性能对比:同样的查询条件,表p1(执行时间0.02秒)比p2(执行时间0.49秒)要快几十倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.49 sec)
查看两者执行计划对比: 同样的查询,表p1扫描行数只有2W多,而表p2扫描行数有62W行,相差巨大。
mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p223
type: ALL
...
rows: 24711
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果过滤字段不完整呢?比如不检索最后一列,再次做下对比:同样表p1(0.1秒)比表p2(0.52秒)执行时间要少几倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.10 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.52 sec)
那只检索第一列呢:这次表p1和p2执行时间上差不多,p2稍微占优势。
mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.56 sec)
mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.45 sec)
看下执行计划对比:表p1扫描的分区数为26个,表p2仅扫描1个分区,分区数量上表p2相对少很多。
mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
type: ALL
...
rows: 648074
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果把字段r1拿掉呢?执行时间也相差无几,表p1和表p2都会扫描所有分区。
mysql> select count(*) from p1 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.87 sec)
mysql> select count(*) from p2 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.75 sec)
那鉴于此,再来探讨一个问题:对于多列分区,字段的排列顺序是否重要?
关于这个顺序要和我们查询语句对应的过滤条件来一一说明。 类似下面两类 SQL :
SQL 1: select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;
对于SQL 1,顺序无关紧要,因为三个列在查询时都已包含;
SQL 2: select * from p1 where r1 = 2 and r2 = 2;
对于SQL 2 , (r1,r2,r3) 和 (r2,r1,r3) 都可以满足。
SQL 3: select * from p1 where r2 = 2 and r3 = 2;
对于SQL 3, (r2,r3,r1) 和 (r3,r2,r1) 也都可以满足。
用同样的方法来建立分区表p3,分区字段顺序为(r2,r3,r1):
mysql> show create table p3\G
*************************** 1. row ***************************
Table: p3
Create Table: CREATE TABLE `p3` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
/*!50500 PARTITION BY RANGE COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...
对于表p3来讲:下面这条 SQL 执行时间比表p1要快几十倍,由于分区字段顺序不同,表p1要扫描所有分区才能出结果。
mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (0.22 sec)
mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (5.05 sec)
所以对于多列分区表,正如开头讲的一样,它和联合索引的使用方法、注意事项、使用场景也都很类似。对于某些特定的场景,使用多列分区能显著加快查询性能。

边栏推荐
- 剖析下零拷贝机制的实现原理,适用场景和代码实现
- About Kali using xshell connection
- Bottom level internal skill cultivation
- Kali installation tutorial 2020
- Viewing splitchunks code segmentation from MPX resource construction optimization
- How MySQL queries character set codes of tables
- 【WebDriver】使用AutoIt上传文件
- C语言练习----指针字符串、链表
- Issue 42: is it necessary for MySQL to have multiple column partitions
- R语言使用MASS包的glm.nb函数建立负二项广义线性模型(negative binomial)、summary函数获取负二项广义线性模型模型汇总统计信息
猜你喜欢

如何使用B/S开发工具DevExtreme的图表控件 - 自定义轴位置?
![分割回文串[dp + dfs组合]](/img/7b/221b000984977508f849e19802c2c2.png)
分割回文串[dp + dfs组合]

OpenFeign使用步骤 轮询策略与权重 log4j使用 openFeign拦截器的配置

Viewing splitchunks code segmentation from MPX resource construction optimization

关于日期相加减问题

Selenium upload file

Online sql to CSV tool

Walk with love, educate and run poor families, and promote public welfare undertakings

What are the usage scenarios for locks in MySQL

Does MySQL support foreign keys
随机推荐
0 basic self-study STM32 (wildfire) - register lit LED
力扣今日题-535. TinyURL 的加密与解密
腾讯云发布自动化交付和运维产品Orbit,推动企业应用全面云原生化
Leetcode daily question - 535 Encryption and decryption of tinyurl
How to solve the 2003 error of MySQL in Linux
About Kali using xshell connection
在线SQL转CSV工具
R语言使用自定义函数编写深度学习Leaky ReLU激活函数、并可视化Leaky ReLU激活函数
剑指 Offer 13. 机器人的运动范围 (BFS)
自定义HandlerInterceptor拦截器实现用户鉴权
The dplyr package filter function of R language filters the data in dataframe data through combinatorial logic (and logic). The content of one field is equal to one of the specified vectors, and the v
mysql游标的作用是什么
剖析下零拷贝机制的实现原理,适用场景和代码实现
MySQL触发器如何创建与删除
selenium上传文件
Kali installation tutorial 2020
R language uses user-defined functions to write deep learning linear activation functions and visualize linear activation functions
What is the SRM system? How do I apply the SRM system?
DevCloud加持下的青软,让教育“智”上云端
R语言ggplot2可视化:使用patchwork包(直接使用加号+)将一个ggplot2可视化结果和一个plot函数可视化结果横向组合起来形成最终结果图