当前位置:网站首页>第42期:MySQL 是否有必要多列分区
第42期:MySQL 是否有必要多列分区
2022-06-29 17:03:00 【爱可生开源社区】
之前的篇章我们讨论的都是基于单列的分区表,那有无必要建立基于多列的分区表?这种分区表数据分布是否均匀?有无特殊的应用场景?有无特殊的优化策略?本篇基于这些问题来进行重点解读。
MySQL 不仅支持基于单列分区,也支持基于多列分区。比如基于字段(f1,f2,f3)来建立分区表,使用方法和使用场景都有些类似于联合索引。比如下面查询语句,同时对列(f1,f2,f3) 进行过滤。
select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;多列分区表的前提是参与分区的列检索频率均等,如果不均等,就没有必要使用多列分区。
我们还是以具体实例来验证下多列分区的优缺点以及适用场景,这样理解起来更加透彻。
建立一张表p1,字段r1,r2,r3分别取值为1-8,1-5,1-5.
create table p1(r1 int,r2 int,r3 int,log_date datetime);按照字段(r1,r2,r3) 的分布范围,我来写个存储过程处理下表p1,变为分区表。存储过程代码如下:
DELIMITER $$
USE `ytt_new`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
DECLARE i,j,k INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
WHILE i <= 8 DO
set j = 1;
while j <= 5 do
set k = 1;
while k <= 5 do
SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
set k = k + 1;
end while;
set j = j + 1;
end while;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;调用存储过程,变更表p1为多列分区表,此时表p1有201个分区,记录数为500W条。
mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)
mysql> select count(partition_name) as partition_count from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
| 201 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (12.01 sec)用同样的方法建立一张分区表p2,来对单列分区表与多列分区表在一些场景下的性能做下对比:
分区表p2按照字段r1分区,仅仅分了9个。
mysql> CREATE TABLE `p2` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
PARTITION p2 VALUES LESS THAN (2) ,
PARTITION p3 VALUES LESS THAN (3) ,
PARTITION p4 VALUES LESS THAN (4) ,
PARTITION p5 VALUES LESS THAN (5) ,
PARTITION p6 VALUES LESS THAN (6) ,
PARTITION p7 VALUES LESS THAN (7) ,
PARTITION p8 VALUES LESS THAN (8) ,
PARTITION p_max VALUES LESS THAN (MAXVALUE)
)
1 row in set (0.00 sec)
mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000 Duplicates: 0 Warnings: 0多个字段等值过滤的性能对比:同样的查询条件,表p1(执行时间0.02秒)比p2(执行时间0.49秒)要快几十倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.49 sec)查看两者执行计划对比: 同样的查询,表p1扫描行数只有2W多,而表p2扫描行数有62W行,相差巨大。
mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p223
type: ALL
...
rows: 24711
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)如果过滤字段不完整呢?比如不检索最后一列,再次做下对比:同样表p1(0.1秒)比表p2(0.52秒)执行时间要少几倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.10 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.52 sec)那只检索第一列呢:这次表p1和p2执行时间上差不多,p2稍微占优势。
mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.56 sec)
mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.45 sec)看下执行计划对比:表p1扫描的分区数为26个,表p2仅扫描1个分区,分区数量上表p2相对少很多。
mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
type: ALL
...
rows: 648074
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)如果把字段r1拿掉呢?执行时间也相差无几,表p1和表p2都会扫描所有分区。
mysql> select count(*) from p1 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.87 sec)
mysql> select count(*) from p2 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.75 sec)那鉴于此,再来探讨一个问题:对于多列分区,字段的排列顺序是否重要?
关于这个顺序要和我们查询语句对应的过滤条件来一一说明。 类似下面两类 SQL :
SQL 1: select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;对于SQL 1,顺序无关紧要,因为三个列在查询时都已包含;
SQL 2: select * from p1 where r1 = 2 and r2 = 2;对于SQL 2 , (r1,r2,r3) 和 (r2,r1,r3) 都可以满足。
SQL 3: select * from p1 where r2 = 2 and r3 = 2;对于SQL 3, (r2,r3,r1) 和 (r3,r2,r1) 也都可以满足。
用同样的方法来建立分区表p3,分区字段顺序为(r2,r3,r1):
mysql> show create table p3\G
*************************** 1. row ***************************
Table: p3
Create Table: CREATE TABLE `p3` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
/*!50500 PARTITION BY RANGE COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...对于表p3来讲:下面这条 SQL 执行时间比表p1要快几十倍,由于分区字段顺序不同,表p1要扫描所有分区才能出结果。
mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (0.22 sec)
mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (5.05 sec)所以对于多列分区表,正如开头讲的一样,它和联合索引的使用方法、注意事项、使用场景也都很类似。对于某些特定的场景,使用多列分区能显著加快查询性能。
边栏推荐
- 使用kalibr标定工具进行单目相机和双目相机的标定
- “授权同意”落地压力大?隐私计算提供一种可能的合规“技术解”
- Mathematical knowledge: finding combinatorial number II - finding combinatorial number
- mysql数据库扫盲,你真的知道什么是数据库嘛
- 解题元宇宙,网络游戏中的多元通信方案
- Paper notes: e (n) equivariant graph neural networks
- What memory consuming data is stored in MySQL CDC jobmanager?
- Calibration of binocular camera based on OpenCV
- Subgraphs in slam
- NAACL 2022 | 机器翻译SOTA模型的蒸馏
猜你喜欢

知道创宇为能源行业资产管理助力,入选工信部2021物联网示范项目

About xampp unable to start MySQL database

如何配置 logback?30分鐘讓你徹底學會代碼熬夜敲

@Component与@Configuration区别

A simple but scalable feature normalization method

深圳内推 | 深圳计算科学研究院招聘机器学习助理工程师(校招)

STM32 key chattering elimination -- Thinking of entry state machine

机器学习8-人工神经网络

Function calculation asynchronous task capability introduction - task trigger de duplication

最高81.98%!超百所“双一流”高校本科深造率公布
随机推荐
机器学习8-人工神经网络
【无标题】
Subgraphs in slam
如何在 PowerPoint 中向幻灯片添加 SmartArt?
Gradle下载慢或无法下载
I, a tester from a large factory, went to a state-owned enterprise with a 50% pay cut. I regret it
curl: (56) Recv failure: Connection reset by peer
基于C语言开发实现的一个用户级线程库
使用kalibr標定工具進行單目相機和雙目相機的標定
Fluent的msh格式网格学习
我的远程办公初体验 | 社区征文
mysql数据库扫盲,你真的知道什么是数据库嘛
关于onReachButton 不触发可能原因
Greedy Apple plans to raise the price of iphone14, which will provide opportunities for Chinese mobile phones
C language microblog user management system
微博评论高性能高可用架构设计(架构实战营 模块五作业)
What's the difference between isempty and isblank? Half of the people can't answer it?
Inheritablethreadlocal resolves message loss during message transmission between parent and child threads in the thread pool
When MySQL RDS is collected using Flink CDC, the datetime type field will be compared with the source table after collection
卷妹带你学jdbc—2天冲刺Day1