当前位置:网站首页>Master the use of auto analyze in data warehouse
Master the use of auto analyze in data warehouse
2022-07-04 17:29:00 【Huawei cloud developer Alliance】
Abstract :analyze Whether the implementation is timely , To some extent, it directly determines SQL Speed of execution .
This article is shared from Huawei cloud community 《 Article to read autoanalyze Use 【 Gauss is not a mathematician this time 】》, author : leapdb.
analyze Whether the implementation is timely , To some extent, it directly determines SQL Speed of execution . therefore ,GaussDB(DWS) Automatic statistical information collection is introduced , It can make users no longer worry about whether the statistical information is expired .
1. Automatically collect scenes
There are usually five scenarios where automatic statistical information collection is required : Batch DML At the end , The incremental DML At the end ,DDL At the end , Query start and background scheduled tasks .
therefore , In order to avoid being against DML,DDL Unnecessary performance overhead and deadlock risk , We chose to trigger before the query started analzye.
2. Automatic collection principle
GaussDB(DWS) stay SQL In the process of execution , It will record the addition, deletion, modification and query of relevant runtime Statistics , And record the shared memory after the transaction is committed or rolled back .
This information can be obtained through “pg_stat_all_tables View ” Inquire about , You can also query through the following functions .
pg_stat_get_tuples_inserted -- Table accumulation insert Number of pieces
pg_stat_get_tuples_updated -- Table accumulation update Number of pieces
pg_stat_get_tuples_deleted -- Table accumulation delete Number of pieces
pg_stat_get_tuples_changed -- Table since last analyze since , Number of changes
pg_stat_get_last_analyze_time -- Query the last analyze Time
therefore , Based on shared memory " Table since last analyze Number of entries modified since " Whether a certain threshold is exceeded , You can decide whether you need to do analyze 了 .
3. Automatically collect thresholds
3.1 Global threshold
autovacuum_analyze_threshold # The table triggers analyze Minimum modification of
autovacuum_analyze_scale_factor # The table triggers analyze Percentage of changes when
When " Table since last analyze Number of entries modified since " >= autovacuum_analyze_threshold + Table estimated size * autovacuum_analyze_scale_factor when , It needs to be triggered automatically analyze.
3.2 Table level threshold
-- Set table level threshold
ALTER TABLE item SET (autovacuum_analyze_threshold=50);
ALTER TABLE item SET (autovacuum_analyze_scale_factor=0.1);
-- Query threshold
postgres=# select pg_options_to_table(reloptions) from pg_class where relname='item';
pg_options_to_table
---------------------------------------
(autovacuum_analyze_threshold,50)
(autovacuum_analyze_scale_factor,0.1)
(2 rows)
-- Reset threshold
ALTER TABLE item RESET (autovacuum_analyze_threshold);
ALTER TABLE item RESET (autovacuum_analyze_scale_factor);
The data characteristics of different tables are different , Need to trigger analyze The threshold may have different requirements . The table level threshold priority is higher than the global threshold .
3.3 Check whether the modification amount of the table exceeds the threshold ( Only the current CN)
postgres=# select pg_stat_get_local_analyze_status('t_analyze'::regclass);
pg_stat_get_local_analyze_status
----------------------------------
Analyze not needed
(1 row)
4. Automatic collection method
GaussDB(DWS) Automatic analysis of the following table in three scenarios is provided .
- When there is “ Statistics are completely missing ” or “ The modification amount reaches analyze threshold ” Table of , And the implementation plan does not take FQS (Fast Query Shipping) Execution time , Through autoanalyze Control the automatic collection of statistical information in the following table in this scenario . here , The query statement will wait for the statistics to be collected successfully , Generate a better execution plan , Then execute the original query statement .
- When autovacuum Set to on when , The system will start regularly autovacuum Threads , Yes “ The modification amount reaches analyze threshold ” The table automatically collects statistical information in the background .
5. Freeze Statistics
5.1 Freeze table distinct value
When a watch distinct It's always inaccurate , for example : Data pile up and repeat the scene . If the watch distinct Fixed value , You can freeze the table in the following ways distinct value .
postgres=# alter table lineitem alter l_orderkey set (n_distinct=0.9);
ALTER TABLE
postgres=# select relname,attname,attoptions from pg_attribute a,pg_class c where c.oid=a.attrelid and attname='l_orderkey';
relname | attname | attoptions
----------+------------+------------------
lineitem | l_orderkey | {n_distinct=0.9}
(1 row)
postgres=# alter table lineitem alter l_orderkey reset (n_distinct);
ALTER TABLE
postgres=# select relname,attname,attoptions from pg_attribute a,pg_class c where c.oid=a.attrelid and attname='l_orderkey';
relname | attname | attoptions
----------+------------+------------
lineitem | l_orderkey |
(1 row)
5.2. Freeze all statistics of the table
If the data characteristics of the table are basically unchanged , You can also freeze the statistics of the table , To avoid repeating analyze.
alter table table_name set frozen_stats=true;
6. Manually check whether the table needs to be done analyze
a. I don't want to trigger the database background task during the business peak , So I don't want to open autovacuum To trigger analyze, What do I do ?
b. The business has modified a number of tables , I want to do these watches right away analyze, I don't know what watches are there , What do I do ?
c. Before the business peak comes, I want to do a test on the tables near the threshold analyze, What do I do ?
We will autovacuum Check the threshold to determine whether analyze Logic , Extraction becomes a function , Help users flexibly and proactively check which tables need to be done analyze.
6.1 Determine whether the table needs analyze( Serial version , Applicable to all historical versions )
-- the function for get all pg_stat_activity information in all CN of current cluster.
CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text)
RETURNS BOOl
AS $$
DECLARE
row_data record;
coor_name record;
fet_active text;
fetch_coor text;
relTuples int4;
changedTuples int4:= 0;
rel_anl_threshold int4;
rel_anl_scale_factor float4;
sys_anl_threshold int4;
sys_anl_scale_factor float4;
anl_threshold int4;
anl_scale_factor float4;
need_analyze bool := false;
BEGIN
--Get all the node names
fetch_coor := 'SELECT node_name FROM pgxc_node WHERE node_type=''C''';
FOR coor_name IN EXECUTE(fetch_coor) LOOP
fet_active := 'EXECUTE DIRECT ON (' || coor_name.node_name || ') ''SELECT pg_stat_get_tuples_changed(oid) from pg_class where relname = ''''|| table_name ||'''';''';
FOR row_data IN EXECUTE(fet_active) LOOP
changedTuples = changedTuples + row_data.pg_stat_get_tuples_changed;
END LOOP;
END LOOP;
EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples;
EXECUTE 'show autovacuum_analyze_threshold;' into sys_anl_threshold;
EXECUTE 'show autovacuum_analyze_scale_factor;' into sys_anl_scale_factor;
EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_threshold'') as value
from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_threshold;
EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_scale_factor'') as value
from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_scale_factor;
--dbms_output.put_line('relTuples='||relTuples||'; sys_anl_threshold='||sys_anl_threshold||'; sys_anl_scale_factor='||sys_anl_scale_factor||'; rel_anl_threshold='||rel_anl_threshold||'; rel_anl_scale_factor='||rel_anl_scale_factor||';');
if rel_anl_threshold IS NOT NULL then
anl_threshold = rel_anl_threshold;
else
anl_threshold = sys_anl_threshold;
end if;
if rel_anl_scale_factor IS NOT NULL then
anl_scale_factor = rel_anl_scale_factor;
else
anl_scale_factor = sys_anl_scale_factor;
end if;
if changedTuples > anl_threshold + anl_scale_factor * relTuples then
need_analyze := true;
end if;
return need_analyze;
END; $$
LANGUAGE 'plpgsql';
6.2 Determine whether the table needs analyze( Parallel Edition , For versions that support parallel execution frameworks )
-- the function for get all pg_stat_activity information in all CN of current cluster.
--SELECT sum(a) FROM pg_catalog.pgxc_parallel_query('cn', 'SELECT 1::int FROM pg_class LIMIT 10') AS (a int); Using concurrent execution framework
CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text)
RETURNS BOOl
AS $$
DECLARE
relTuples int4;
changedTuples int4:= 0;
rel_anl_threshold int4;
rel_anl_scale_factor float4;
sys_anl_threshold int4;
sys_anl_scale_factor float4;
anl_threshold int4;
anl_scale_factor float4;
need_analyze bool := false;
BEGIN
--Get all the node names
EXECUTE 'SELECT sum(a) FROM pg_catalog.pgxc_parallel_query(''cn'', ''SELECT pg_stat_get_tuples_changed(oid)::int4 from pg_class where relname = ''''|| table_name ||'''';'') AS (a int4);' into changedTuples;
EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples;
EXECUTE 'show autovacuum_analyze_threshold;' into sys_anl_threshold;
EXECUTE 'show autovacuum_analyze_scale_factor;' into sys_anl_scale_factor;
EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_threshold'') as value
from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_threshold;
EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_scale_factor'') as value
from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_scale_factor;
dbms_output.put_line('relTuples='||relTuples||'; sys_anl_threshold='||sys_anl_threshold||'; sys_anl_scale_factor='||sys_anl_scale_factor||'; rel_anl_threshold='||rel_anl_threshold||'; rel_anl_scale_factor='||rel_anl_scale_factor||';');
if rel_anl_threshold IS NOT NULL then
anl_threshold = rel_anl_threshold;
else
anl_threshold = sys_anl_threshold;
end if;
if rel_anl_scale_factor IS NOT NULL then
anl_scale_factor = rel_anl_scale_factor;
else
anl_scale_factor = sys_anl_scale_factor;
end if;
if changedTuples > anl_threshold + anl_scale_factor * relTuples then
need_analyze := true;
end if;
return need_analyze;
END; $$
LANGUAGE 'plpgsql';
6.3 Determine whether the table needs analyze( Custom threshold )
-- the function for get all pg_stat_activity information in all CN of current cluster.
CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text, int anl_threshold, float anl_scale_factor)
RETURNS BOOl
AS $$
DECLARE
relTuples int4;
changedTuples int4:= 0;
need_analyze bool := false;
BEGIN
--Get all the node names
EXECUTE 'SELECT sum(a) FROM pg_catalog.pgxc_parallel_query(''cn'', ''SELECT pg_stat_get_tuples_changed(oid)::int4 from pg_class where relname = ''''|| table_name ||'''';'') AS (a int4);' into changedTuples;
EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples;
if changedTuples > anl_threshold + anl_scale_factor * relTuples then
need_analyze := true;
end if;
return need_analyze;
END; $$
LANGUAGE 'plpgsql';
through “ Optimizer triggered real-time analyze” and “ backstage autovacuum Triggered polling analyze”,GaussDB(DWS) It has been possible to make users no longer care about whether the table needs analyze. It is recommended to try in the latest version .
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- How to implement a delay queue?
- Perfectly integrated into win11 style, Microsoft's new onedrive client is the first to see
- 整理混乱的头文件,我用include what you use
- 世界环境日 | 周大福用心服务推动减碳环保
- VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
- Capvision Rongying's prospectus in Hong Kong was "invalid": it was strictly questioned by the CSRC and required supplementary disclosure
- Firewall basic transparent mode deployment and dual machine hot standby
- egg. JS learning notes
- OPPO小布推出预训练大模型OBERT,晋升KgCLUE榜首
- C# 更加优质的操作MongoDB数据库
猜你喜欢
Go development: how to use go singleton mode to ensure the security of high concurrency of streaming media?
第十八届IET交直流輸電國際會議(ACDC2022)於線上成功舉辦
建筑建材行业经销商协同系统解决方案:赋能企业构建核心竞争力
周大福践行「百周年承诺」,真诚服务推动绿色环保
Go micro tutorial - Chapter 2 go micro V3 using gin and etcd
Zebras are recognized as dogs, and the reason for AI's mistakes is found by Stanford
世界环境日 | 周大福用心服务推动减碳环保
To sort out messy header files, I use include what you use
The company needs to be monitored. How do ZABBIX and Prometheus choose? That's the right choice!
一加10 Pro和iPhone 13怎么选?
随机推荐
照明行业S2B2B解决方案:高效赋能产业供应链,提升企业经济效益
2022PMP考试基本情况详情了解
C# 服务器日志模块
The Ministry of human resources and Social Security announced the new construction occupation
Learn more about the basic situation of 2022pmp examination
Can you really use MySQL explain?
[glide] cache implementation - memory and disk cache
NFT liquidity market security issues occur frequently - Analysis of the black incident of NFT trading platform quixotic
Developers, MySQL column finish, help you easily from installation to entry
中信证券网上开户安全吗 开户收费吗
[acwing] 58 weeks 4489 Longest subsequence
利用win10计划任务程序定时自动运行jar包
Vb无法访问数据库stocks
51 single chip microcomputer temperature alarm based on WiFi control
Go development: how to use go singleton mode to ensure the security of high concurrency of streaming media?
PyTorch深度学习快速入门教程
CANN算子:利用迭代器高效实现Tensor数据切割分块处理
VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
Zhijieyun - meta universe comprehensive solution service provider
长城证券开户安全吗 证券账户怎么开通