当前位置:网站首页>第42期:MySQL 是否有必要多列分区
第42期:MySQL 是否有必要多列分区
2022-06-29 17:03:00 【爱可生开源社区】
之前的篇章我们讨论的都是基于单列的分区表,那有无必要建立基于多列的分区表?这种分区表数据分布是否均匀?有无特殊的应用场景?有无特殊的优化策略?本篇基于这些问题来进行重点解读。
MySQL 不仅支持基于单列分区,也支持基于多列分区。比如基于字段(f1,f2,f3)来建立分区表,使用方法和使用场景都有些类似于联合索引。比如下面查询语句,同时对列(f1,f2,f3) 进行过滤。
select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;
多列分区表的前提是参与分区的列检索频率均等,如果不均等,就没有必要使用多列分区。
我们还是以具体实例来验证下多列分区的优缺点以及适用场景,这样理解起来更加透彻。
建立一张表p1,字段r1,r2,r3分别取值为1-8,1-5,1-5.
create table p1(r1 int,r2 int,r3 int,log_date datetime);
按照字段(r1,r2,r3) 的分布范围,我来写个存储过程处理下表p1,变为分区表。存储过程代码如下:
DELIMITER $$
USE `ytt_new`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
DECLARE i,j,k INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
WHILE i <= 8 DO
set j = 1;
while j <= 5 do
set k = 1;
while k <= 5 do
SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
set k = k + 1;
end while;
set j = j + 1;
end while;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;
调用存储过程,变更表p1为多列分区表,此时表p1有201个分区,记录数为500W条。
mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)
mysql> select count(partition_name) as partition_count from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
| 201 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (12.01 sec)
用同样的方法建立一张分区表p2,来对单列分区表与多列分区表在一些场景下的性能做下对比:
分区表p2按照字段r1分区,仅仅分了9个。
mysql> CREATE TABLE `p2` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
PARTITION p2 VALUES LESS THAN (2) ,
PARTITION p3 VALUES LESS THAN (3) ,
PARTITION p4 VALUES LESS THAN (4) ,
PARTITION p5 VALUES LESS THAN (5) ,
PARTITION p6 VALUES LESS THAN (6) ,
PARTITION p7 VALUES LESS THAN (7) ,
PARTITION p8 VALUES LESS THAN (8) ,
PARTITION p_max VALUES LESS THAN (MAXVALUE)
)
1 row in set (0.00 sec)
mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000 Duplicates: 0 Warnings: 0
多个字段等值过滤的性能对比:同样的查询条件,表p1(执行时间0.02秒)比p2(执行时间0.49秒)要快几十倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
| 24992 |
+----------+
1 row in set (0.49 sec)
查看两者执行计划对比: 同样的查询,表p1扫描行数只有2W多,而表p2扫描行数有62W行,相差巨大。
mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p223
type: ALL
...
rows: 24711
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 0.10
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果过滤字段不完整呢?比如不检索最后一列,再次做下对比:同样表p1(0.1秒)比表p2(0.52秒)执行时间要少几倍。
mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.10 sec)
mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
| 124649 |
+----------+
1 row in set (0.52 sec)
那只检索第一列呢:这次表p1和p2执行时间上差不多,p2稍微占优势。
mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.56 sec)
mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
| 624599 |
+----------+
1 row in set (0.45 sec)
看下执行计划对比:表p1扫描的分区数为26个,表p2仅扫描1个分区,分区数量上表p2相对少很多。
mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p1
partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
type: ALL
...
rows: 648074
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p2
partitions: p3
type: ALL
...
rows: 623239
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
如果把字段r1拿掉呢?执行时间也相差无几,表p1和表p2都会扫描所有分区。
mysql> select count(*) from p1 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.87 sec)
mysql> select count(*) from p2 where r2 = 2;
+----------+
| count(*) |
+----------+
| 998700 |
+----------+
1 row in set (3.75 sec)
那鉴于此,再来探讨一个问题:对于多列分区,字段的排列顺序是否重要?
关于这个顺序要和我们查询语句对应的过滤条件来一一说明。 类似下面两类 SQL :
SQL 1: select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;
对于SQL 1,顺序无关紧要,因为三个列在查询时都已包含;
SQL 2: select * from p1 where r1 = 2 and r2 = 2;
对于SQL 2 , (r1,r2,r3) 和 (r2,r1,r3) 都可以满足。
SQL 3: select * from p1 where r2 = 2 and r3 = 2;
对于SQL 3, (r2,r3,r1) 和 (r3,r2,r1) 也都可以满足。
用同样的方法来建立分区表p3,分区字段顺序为(r2,r3,r1):
mysql> show create table p3\G
*************************** 1. row ***************************
Table: p3
Create Table: CREATE TABLE `p3` (
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`r3` int DEFAULT NULL,
`log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
/*!50500 PARTITION BY RANGE COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...
对于表p3来讲:下面这条 SQL 执行时间比表p1要快几十倍,由于分区字段顺序不同,表p1要扫描所有分区才能出结果。
mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (0.22 sec)
mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
| 199648 |
+----------+
1 row in set (5.05 sec)
所以对于多列分区表,正如开头讲的一样,它和联合索引的使用方法、注意事项、使用场景也都很类似。对于某些特定的场景,使用多列分区能显著加快查询性能。
边栏推荐
- Subgraphs in slam
- Mathematical knowledge: finding combinatorial number II - finding combinatorial number
- Advanced webgl performance optimization
- Redis bloom filter and cuckoo filter
- PCB板框的绘制——AD19
- windows平台下的mysql启动等基本操作
- kotlin基础语法
- 6.26CF模拟赛D:黑白条题题解
- Simulink simulation mode
- Which parameter is the partition information adjusted? The MySQL source stream API is used, not the table API
猜你喜欢
贪婪的苹果计划提高iPhone14的价格,这将为中国手机提供机会
代码大全读后感
“授权同意”落地压力大?隐私计算提供一种可能的合规“技术解”
使用kalibr標定工具進行單目相機和雙目相機的標定
What are the project management systems suitable for small and medium-sized enterprises?
知道创宇为能源行业资产管理助力,入选工信部2021物联网示范项目
MySQL foundation - transaction
可转债策略之---(摊饼玩法,溢价玩法,强赎玩法,下修玩法,双低玩法)
MySQL foundation - multi table query
Picture and text show you how to thoroughly understand the atomicity of MySQL transaction undolog
随机推荐
InheritableThreadLocal 在线程池中进行父子线程间消息传递出现消息丢失的解析
SLAM中的子图
使用kalibr標定工具進行單目相機和雙目相機的標定
kotlin基础语法
如何在 PowerPoint 中向幻灯片添加 SmartArt?
flink sql rownumber 报错。谁遇到过啊?怎么解决?
@Difference between component and @configuration
数学知识:求组合数 II—求组合数
mysql数据库扫盲,你真的知道什么是数据库嘛
「科普大佬说」AI与创造力
为什么信息化 ≠ 数字化?终于有人讲明白了
Function calculation asynchronous task capability introduction - task trigger de duplication
sectigo ov泛域名证书一年一千五百九十元好用吗
Comment configurer logback? 30 minutes pour apprendre à coder et à frapper tard.
如何配置 logback?30分鐘讓你徹底學會代碼熬夜敲
Basics | draw arcs in the physics engine
卷妹带你学数据库---5天冲刺Day4
Mathematical knowledge: finding combinatorial number II - finding combinatorial number
iNFTnews | Meta在元宇宙中的后续计划会是什么?
Word2vec vector model of Wiki Chinese corpus based on deep learning