当前位置:网站首页>PG's SQL execution plan
PG's SQL execution plan
2022-08-02 22:36:00 【Ink Sky Wheel】
大家好, Share with you today yesPGaspects related to the execution plan.
and well-known databasesORACLE,MYSQL 一样,PG 的 The optimizer is also based on CBO的成本计算,to generate the theoretically optimal execution plan.
不同的数据库,同样的 explain 命令, Brings you detailed output of the execution plan:
[email protected][local:/tmp]:1992=#105846 create table tab (id int , name varchar(200));CREATE TABLE[email protected][local:/tmp]:1992=#105846 insert into tab values (generate_series(1,10000),'PG execution plan');INSERT 0 10000[email protected][local:/tmp]:1992=#105846 \timingTiming is [email protected][local:/tmp]:1992=#105846 explain select * from tab; QUERY PLAN ----------------------------------------------------------Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22)(1 row)Time: 0.566 ms
explain 命令 It can be followed by different parameters, 含义如下:
EXPLAIN [ ( option [, ...] ) ] statement
ANALYZE [ boolean ] : 通过实际执行SQL to get the real execution plan,The elapsed time and number of rows returned by each step are real
[email protected][local:/tmp]:1992=#105846 explain analyze select * from tab; QUERY PLAN --------------------------------------------------------------------------------------------------------Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.565 rows=10000 loops=1)Planning Time: 0.035 msExecution Time: 0.870 ms(3 rows)Time: 1.221 ms
VERBOSE [ boolean ]: Output more detailed information:比如 Query Identifier 这个重要的属性 类似于mysql 的SQL digest 或者是 oracle 的SQL_ID
这个 Query Identifier 与 pg_stat_statements Inside the plugin 一样的
[email protected][local:/tmp]:1992=#105846 explain analyze verbose select * from tab; QUERY PLAN ---------------------------------------------------------------------------------------------------------------Seq Scan on public.tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.009..0.731 rows=10000 loops=1) Output: id, nameQuery Identifier: 4997534032644374154Planning Time: 0.038 msExecution Time: 1.123 ms(5 rows)Time: 1.499 ms
COSTS [ boolean ]: 显示 cost 成本, 这个默认就是 打开的
BUFFERS [ boolean ]: Display memory and disk read and write conditions , Buffers: shared hit=64 表示 内存中的 64个 page 全部命中, Check data directly from disk
(select pg_size_pretty(pg_relation_size(‘tab’)); 512 kB/8kB = 64 pages )
[email protected][local:/tmp]:1992=#105846 explain (analyze true , verbose true ,buffers true ) select * from tab; QUERY PLAN ---------------------------------------------------------------------------------------------------------------Seq Scan on public.tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.017..2.605 rows=10000 loops=1) Output: id, name Buffers: shared hit=64Query Identifier: 4997534032644374154Planning Time: 0.087 msExecution Time: 4.117 ms(6 rows)Time: 4.695 [email protected][local:/tmp]:1992=#105846 select pg_size_pretty(pg_relation_size('tab'));pg_size_pretty----------------512 kB(1 row)Time: 0.315 ms
WAL [ boolean ]:对WAL Statistics of log write information.一般与analyze 联合使用,达到SQLReal shipment,WALthe purpose of the accuracy of the information. 这个参数是在PG 13版本引入的.
我们测试一下,插入100generated from thousands of dataWAL 日志的大小: WAL: records=1000000 bytes=69000000 大致是65M
[email protected][local:/tmp]:1992=#113927 select 69000000/1024/1024 as "WAL size(MB)";WAL size(MB)-------------- 65(1 row)
[email protected][local:/tmp]:1992=#113927 create table tab2(id int, name varchar(200));CREATE TABLE[email protected][local:/tmp]:1992=#113927 explain (analyze,wal) insert into tab2 values (generate_series(1,1000000),'hello PG!'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------Insert on tab2 (cost=0.00..5000.02 rows=0 width=0) (actual time=939.011..939.012 rows=0 loops=1) WAL: records=1000000 bytes=69000000 -> ProjectSet (cost=0.00..5000.02 rows=1000000 width=422) (actual time=0.003..93.902 rows=1000000 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)Planning Time: 0.046 msExecution Time: 939.047 ms(6 rows)
FORMAT { TEXT | XML | JSON | YAML }: Try the variety of output formats, 支持 YAML This format is really amazing
[email protected][local:/tmp]:1992=#105846 explain (analyze true , verbose true ,buffers true, format YAML ) select * from tab; QUERY PLAN ------------------------------------------ Plan: + Node Type: "Seq Scan" + Parallel Aware: false + Async Capable: false + Relation Name: "tab" + Schema: "public" + Alias: "tab" + Startup Cost: 0.00 + Total Cost: 228.00 + Plan Rows: 10000 + Plan Width: 17 + Actual Startup Time: 0.054 + Actual Total Time: 0.862 + Actual Rows: 10000 + Actual Loops: 1 + Output: + - "id" + - "name" + Execution Time: 1.392(1 row)
The above is a brief introduction explain 的选项参数, Let's take a look at the meaning of the output information next
cost The cost is divided into:starting cost 和 总成本 cost=0.00…164.00
返回的行数: rows = 10000
width : The width of the returned column width=17
actual time=0.073…1.135 SQL解析的时间: 0.073, SQL 的总时间:1.135
实际返回的行数: rows=10000
循环的次数: loops=1 Single table query so the number of loops is 1
[email protected][local:/tmp]:1992=#121533 explain analyze select * from tab ; QUERY PLAN --------------------------------------------------------------------------------------------------------Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.641 rows=10000 loops=1)Planning Time: 0.036 msExecution Time: 0.946 ms(3 rows)
接下来我们看一下 cost How the associated costs are calculated?
The calculation formula comes from the official document : (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost).
SELECT relpages, reltuples FROM pg_class WHERE relname = ‘tab’; --得到 64个 page 和 10000 个元祖
seq_page_cost = 1 , cpu_tuple_cost = 0.1
(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164
和 (cost=0.00…164.00 rows=10000 width=22) are consistent with each other
[email protected][local:/tmp]:1992=#121533 SELECT relpages, reltuples FROM pg_class WHERE relname = 'tab';relpages | reltuples----------+----------- 64 | 10000(1 row)[email protected][local:/tmp]:1992=#121533 show seq_page_cost;seq_page_cost---------------1(1 row)[email protected][local:/tmp]:1992=#121533 show cpu_tuple_cost;cpu_tuple_cost----------------0.01(1 row)
如果是带有 where What about filter conditions? The formula for calculating the cost is 在原有的cost 基础之上 + Filter 的成本 (cpu_operator_cost * rows)
cpu_operator_cost : 默认是 0.0025
rows : pg_class 表中的 reltuples 属性 是 10000
原有的cost : (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164
Filter 的成本是 cpu_operator_cost * rows = 10000 +* 0.0025 = 25
所以总的cost 是 164 + 25 = 189
[email protected][local:/tmp]:1992=#113927 explain analyze select * from tab where name ~ 'test%'; QUERY PLAN --------------------------------------------------------------------------------------------------Seq Scan on tab (cost=0.00..189.00 rows=1 width=22) (actual time=10.776..10.777 rows=0 loops=1) Filter: ((name)::text ~ 'test%'::text) Rows Removed by Filter: 10000Planning Time: 0.132 msExecution Time: 10.797 ms(5 rows)[email protected][local:/tmp]:1992=#121533 show cpu_operator_costdbtest-# ;cpu_operator_cost-------------------0.0025(1 row)
We briefly understand the above cost 是如何计算的, Next, let's look at how the tables are accessed in the execution plan and how the tables are connected:
How to access the table:
1)Sequential Scan 全表扫描
2) Index Scan 索引扫描
3) Index Only Scan 覆盖索引扫描
4) Bitmap Heap Scan Indexed bitmap scan
How to connect tables to tables:
- Nested Loops Nested loop query joins
- Merge Join 连接
3)Hash Join 连接
Sequential Scan 全表扫描, Usually happens when there is no possibility to trigger indexing(Or the index selection rate is poor)的情况下,
Generally suitable for super small watch, 或者在OLAP 分析场景下,需要扫描大量数据
执行计划信息: Seq Scan on 表名
[email protected][local:/tmp]:1992=#113927 explain analyze select * from tab2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------Seq Scan on tab2 (cost=0.00..15406.00 rows=1000000 width=14) (actual time=0.007..71.424 rows=1000000 loops=1)Planning Time: 0.145 msExecution Time: 109.381 ms(3 rows)
Index Scan 索引扫描,Usually happens when there is no possibility to trigger indexing(Or the index selection rate is high,一般在5%一下)的情况下,
适合OLTP 高并发场景,Data must be returned in milliseconds
[email protected][local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);CREATE INDEX[email protected][local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);CREATE INDEX[email protected][local:/tmp]:1992=#113927 explain analyze select * from tab2 where name = 'jason' limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------Limit (cost=0.42..4.44 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1) -> Index Scan using idx_name_tab2 on tab2 (cost=0.42..4.44 rows=1 width=14) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: ((name)::text = 'jason'::text)Planning Time: 0.182 msExecution Time: 0.030 ms(5 rows)
Index Only Scan 覆盖索引扫描, 一般发生在 select The column information of is included in the index. It is worth noting here that andMYSQL 不同, PG 由于特殊的MVCC机制, 如果vacuum If not in time,
The covering index will still return the table query for verification. 能否触发 真正的index only scan 还需要看 visibility map 中的 bit 位图的信息.
visibility mapThis can refer to the previous article: https://cdn.modb.pro/db/447177
执行计划中 Heap Fetches: 0 Indicates that no data is retrieved from the table, 存在2种情况:
1)Judging by the index, there is really no data,所以不需要回表
2)Judging by the index that there is really data,Again according toVM 判断,All primates are new,So you can get the latest data from the index, 所以不需要回表 \
[email protected][local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello PG!!' limit 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------Limit (cost=0.42..4.44 rows=1 width=10) (actual time=0.028..0.029 rows=0 loops=1) -> Index Only Scan using idx_name_tab2 on tab2 (cost=0.42..4.44 rows=1 width=10) (actual time=0.027..0.028 rows=0 loops=1) Index Cond: (name = 'hello PG!!'::text) Heap Fetches: 0Planning Time: 0.070 msExecution Time: 0.044 ms(6 rows)
Bitmap Heap Scan Indexed bitmap scan This generally happens when the trigger index exists or 条件的情况下, 建立一张 bitmap to find the desired record
[email protected][local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello oracle' or name = 'hello mysql' ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on tab2 (cost=8.87..12.88 rows=1 width=10) (actual time=0.046..0.047 rows=0 loops=1) Recheck Cond: (((name)::text = 'hello oracle'::text) OR ((name)::text = 'hello mysql'::text)) -> BitmapOr (cost=8.87..8.87 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=1) -> Bitmap Index Scan on idx_name_tab2 (cost=0.00..4.43 rows=1 width=0) (actual time=0.035..0.035 rows=0 loops=1) Index Cond: ((name)::text = 'hello oracle'::text) -> Bitmap Index Scan on idx_name_tab2 (cost=0.00..4.43 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: ((name)::text = 'hello mysql'::text)Planning Time: 0.081 msExecution Time: 0.086 ms(9 rows)
我们再看一下,How to connect tables to tables:
Nested Loops Nested loop query joins 基本上和ORACLE的 nested loop 是无差别的, Suitable for large and small table joins,小表作为驱动表,Access large tables in a way that triggers index accesses.
Of course, the small table mentioned here does not necessarily mean that the table itself is a small table,It may also be a relatively small result set filtered by the index.
适合OLTP 场景, Returns small amounts of data in milliseconds
[email protected][local:/tmp]:1992=#113927 create table tt1 (id int, name varchar(200), pid int);CREATE TABLE[email protected][local:/tmp]:1992=#113927 create table tt2 (id int, name varchar(200));CREATE TABLE[email protected][local:/tmp]:1992=#113927 insert into tt1 values (generate_series(1,1000),'hello pg',generate_series(1,1000));INSERT 0 1000 [email protected][local:/tmp]:1992=#113927 insert into tt2 values (generate_series(1,100000),'hello pg fans');INSERT 0 100000[email protected][local:/tmp]:1992=#113927 create index concurrently idx_tt1_name on tt1 (name);CREATE INDEX[email protected][local:/tmp]:1992=#113927 create index concurrently idx_tt2_id on tt2 (id);CREATE INDEX[email protected][local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.pid = tt2.id and tt1.name = 'hello mysql'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.44..12.49 rows=1 width=35) (actual time=0.007..0.007 rows=0 loops=1) -> Index Scan using idx_tt1_name on tt1 (cost=0.15..4.17 rows=1 width=17) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: ((name)::text = 'hello mysql'::text) -> Index Scan using idx_tt2_id on tt2 (cost=0.29..8.31 rows=1 width=18) (never executed) Index Cond: (id = tt1.pid)Planning Time: 0.262 msExecution Time: 0.030 ms(7 rows)
Merge Join 连接:Usually occurs when the join condition is a join that needs to be sorted(显示指定order by ),Or the join condition is an index(默认排序)的情况,
2Tables can be at the same time parallel 进行扫描,Then make sequential connections
情况1: Trigger index sort
[email protected][local:/tmp]:1992=#113927 create index concurrently idx_id_tt1 on tt1(id);CREATE INDEX[email protected][local:/tmp]:1992=#113927 create index concurrently idx_id_tt2 on tt2(id);CREATE INDEX[email protected][local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=0.66..94.79 rows=1000 width=35) (actual time=0.019..0.595 rows=1000 loops=1) Merge Cond: (tt1.id = tt2.id) -> Index Scan using idx_id_tt1 on tt1 (cost=0.28..45.27 rows=1000 width=17) (actual time=0.006..0.147 rows=1000 loops=1) -> Index Scan using idx_id_tt2 on tt2 (cost=0.29..3244.29 rows=100000 width=18) (actual time=0.007..0.188 rows=1001 loops=1)Planning Time: 0.308 msExecution Time: 0.690 ms(6 rows)
情况2: 没有索引,显示 order by 语句触发
Here we need to put enable_hashjoin 关闭掉 set enable_hashjoin = off;
[email protected][local:/tmp]:1992=#113927 set enable_hashjoin = off;SET[email protected][local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id order by tt1.name desc, tt2.name desc; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------Sort (cost=11307.98..11310.48 rows=1000 width=35) (actual time=25.473..25.574 rows=1000 loops=1) Sort Key: tt1.name DESC, tt2.name DESC Sort Method: quicksort Memory: 103kB -> Merge Join (cost=11262.73..11282.98 rows=1000 width=35) (actual time=24.525..25.155 rows=1000 loops=1) Merge Cond: (tt2.id = tt1.id) -> Sort (cost=11992.82..12242.82 rows=100000 width=18) (actual time=24.242..24.400 rows=1001 loops=1) Sort Key: tt2.id Sort Method: external merge Disk: 2752kB -> Seq Scan on tt2 (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.017..8.195 rows=100000 loops=1) -> Sort (cost=66.83..69.33 rows=1000 width=17) (actual time=0.273..0.338 rows=1000 loops=1) Sort Key: tt1.id Sort Method: quicksort Memory: 87kB -> Seq Scan on tt1 (cost=0.00..17.00 rows=1000 width=17) (actual time=0.015..0.149 rows=1000 loops=1)Planning Time: 0.110 msExecution Time: 26.113 ms(15 rows)
Hash Join 连接 : 熟悉oracle 的朋友们 You should be very familiar with it, 对于 mysql database users 则是 羡慕. 嫉妒 恨 (mysql 8.0.18 The version is already supported hash join,But most of the friends in the circle of friends are still5.7versions are mostly)
HTAP One of the most basic standard configuration of hybrid database. The trigger conditions must be familiar to you:等值连接,小表(smaller table)作为驱动表,生成HASH 散列表 (内存或者磁盘中)Join with large tables,适合2A large table is connected.适合OLTP analysis scenarios
我们可以看到:tt1 as a relatively small table 在内存 ( Memory Usage: 59kB)中 生成了 1024 hash 桶
[email protected][local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id ; QUERY PLAN -----------------------------------------------------------------------------------------------------------------Hash Join (cost=29.50..2051.50 rows=1000 width=35) (actual time=0.241..18.570 rows=1000 loops=1) Hash Cond: (tt2.id = tt1.id) -> Seq Scan on tt2 (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.013..7.489 rows=100000 loops=1) -> Hash (cost=17.00..17.00 rows=1000 width=17) (actual time=0.216..0.218 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 59kB -> Seq Scan on tt1 (cost=0.00..17.00 rows=1000 width=17) (actual time=0.012..0.105 rows=1000 loops=1)Planning Time: 0.148 msExecution Time: 18.657 ms(8 rows)
Finally, we share a website that visualizes the execution plan: https://explain.depesz.com/
把 explain out the text,复制粘贴到网站中,点击submit You can get tabular graphical output.
Have a fun !
Metaverse 001 | Can't control your emotions?The Metaverse is here to help you
Redis 5 种数据结构及对应使用场景
[安洵杯 2019]easy_web
Therapy | How to Identify and Deal with Negative Thoughts
元宇宙001 | 情绪无法自控?元宇宙助你一臂之力
Cannot find declaration to go to
MySQL安装时一直卡在starting server
spack install reports an error /tmp/ccBDQNaB.s: Assembler message:
es 官方诊断工具
Introduction of uncommon interfaces of openlayers
Parse the commonly used methods in the List interface that are overridden by subclasses
es 读流程源码解析
【OpenNI2】资料整理 -- 不断更新中
golang刷leetcode 经典(12) 完全二叉树插入器
脑机接口003 | 马斯克称已实现与云端的虚拟自己对话,相关概念股份大涨