当前位置:网站首页>一文掌握数仓中auto analyze的使用
一文掌握数仓中auto analyze的使用
2022-07-04 20:36:00 【华为云开发者联盟】
本文分享自华为云社区《一文读懂autoanalyze使用【这次高斯不是数学家】》,作者: leapdb。
1. 自动收集场景
2. 自动收集原理
这些信息可以通过 “pg_stat_all_tables视图” 查询,也可以通过下面函数进行查询。
pg_stat_get_tuples_inserted --表累积insert条数pg_stat_get_tuples_updated --表累积update条数pg_stat_get_tuples_deleted --表累积delete条数pg_stat_get_tuples_changed --表自上次analyze以来,修改的条数pg_stat_get_last_analyze_time --查询最近一次analyze时间
因此,根据共享内存中 "表自上次analyze以来修改过的条数" 是否超过一定阈值,就可以判定是否需要做analyze了。
3. 自动收集阈值
3.1 全局阈值
autovacuum_analyze_threshold #表触发analyze的最小修改量autovacuum_analyze_scale_factor #表触发analyze时的修改百分比
当"表自上次analyze以来修改的条数" >= autovacuum_analyze_threshold + 表估算大小 * autovacuum_analyze_scale_factor时,需要自动触发analyze。
3.2 表级阈值
--设置表级阈值ALTER TABLE item SET (autovacuum_analyze_threshold=50);ALTER TABLE item SET (autovacuum_analyze_scale_factor=0.1);--查询阈值postgres=# select pg_options_to_table(reloptions) from pg_class where relname='item'; pg_options_to_table --------------------------------------- (autovacuum_analyze_threshold,50) (autovacuum_analyze_scale_factor,0.1)(2 rows)--重置阈值ALTER TABLE item RESET (autovacuum_analyze_threshold);ALTER TABLE item RESET (autovacuum_analyze_scale_factor);
3.3 查看表的修改量是否超过了阈值(仅当前CN)
postgres=# select pg_stat_get_local_analyze_status('t_analyze'::regclass); pg_stat_get_local_analyze_status ---------------------------------- Analyze not needed(1 row)
4. 自动收集方式
- 当查询中存在“统计信息完全缺失”或“修改量达到analyze阈值”的表,且执行计划不采取FQS (Fast Query Shipping)执行时,则通过autoanalyze控制此场景下表统计信息的自动收集。此时,查询语句会等待统计信息收集成功后,生成更优的执行计划,再执行原查询语句。
- 当autovacuum设置为on时,系统会定时启动autovacuum线程,对“修改量达到analyze阈值”的表在后台自动进行统计信息收集。
5.1 冻结表的distinct值
postgres=# alter table lineitem alter l_orderkey set (n_distinct=0.9);ALTER TABLEpostgres=# select relname,attname,attoptions from pg_attribute a,pg_class c where c.oid=a.attrelid and attname='l_orderkey'; relname | attname | attoptions ----------+------------+------------------ lineitem | l_orderkey | {n_distinct=0.9}(1 row)postgres=# alter table lineitem alter l_orderkey reset (n_distinct);ALTER TABLEpostgres=# select relname,attname,attoptions from pg_attribute a,pg_class c where c.oid=a.attrelid and attname='l_orderkey'; relname | attname | attoptions ----------+------------+------------ lineitem | l_orderkey | (1 row)
5.2. 冻结表的全部统计信息
alter table table_name set frozen_stats=true;
6. 手动查看表是否需要做analyze
a. 不想在业务高峰期时触发数据库后台任务,所以不愿意打开autovacuum来触发analyze,怎么办?
b. 业务修改了一批表,想立即对这些表马上做一次analyze,又不知道都有哪些表,怎么办?
c. 业务高峰来临前想对临近阈值的表都做一次analyze,怎么办?
6.1 判断表是否需要analyze(串行版,适用于所有历史版本)
-- the function for get all pg_stat_activity information in all CN of current cluster.CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text)RETURNS BOOlAS $$DECLARE row_data record; coor_name record; fet_active text; fetch_coor text; relTuples int4; changedTuples int4:= 0; rel_anl_threshold int4; rel_anl_scale_factor float4; sys_anl_threshold int4; sys_anl_scale_factor float4; anl_threshold int4; anl_scale_factor float4; need_analyze bool := false; BEGIN --Get all the node names fetch_coor := 'SELECT node_name FROM pgxc_node WHERE node_type=''C'''; FOR coor_name IN EXECUTE(fetch_coor) LOOP fet_active := 'EXECUTE DIRECT ON (' || coor_name.node_name || ') ''SELECT pg_stat_get_tuples_changed(oid) from pg_class where relname = ''''|| table_name ||'''';'''; FOR row_data IN EXECUTE(fet_active) LOOP changedTuples = changedTuples + row_data.pg_stat_get_tuples_changed; END LOOP; END LOOP; EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples; EXECUTE 'show autovacuum_analyze_threshold;' into sys_anl_threshold; EXECUTE 'show autovacuum_analyze_scale_factor;' into sys_anl_scale_factor; EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_threshold'') as value from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_threshold; EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_scale_factor'') as value from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_scale_factor; --dbms_output.put_line('relTuples='||relTuples||'; sys_anl_threshold='||sys_anl_threshold||'; sys_anl_scale_factor='||sys_anl_scale_factor||'; rel_anl_threshold='||rel_anl_threshold||'; rel_anl_scale_factor='||rel_anl_scale_factor||';'); if rel_anl_threshold IS NOT NULL then anl_threshold = rel_anl_threshold; else anl_threshold = sys_anl_threshold; end if; if rel_anl_scale_factor IS NOT NULL then anl_scale_factor = rel_anl_scale_factor; else anl_scale_factor = sys_anl_scale_factor; end if; if changedTuples > anl_threshold + anl_scale_factor * relTuples then need_analyze := true; end if; return need_analyze; END; $$LANGUAGE 'plpgsql';
6.2 判断表是否需要analyze(并行版,适用于支持并行执行框架的版本)
-- the function for get all pg_stat_activity information in all CN of current cluster.--SELECT sum(a) FROM pg_catalog.pgxc_parallel_query('cn', 'SELECT 1::int FROM pg_class LIMIT 10') AS (a int); 利用并发执行框架CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text)RETURNS BOOlAS $$DECLARE relTuples int4; changedTuples int4:= 0; rel_anl_threshold int4; rel_anl_scale_factor float4; sys_anl_threshold int4; sys_anl_scale_factor float4; anl_threshold int4; anl_scale_factor float4; need_analyze bool := false; BEGIN --Get all the node names EXECUTE 'SELECT sum(a) FROM pg_catalog.pgxc_parallel_query(''cn'', ''SELECT pg_stat_get_tuples_changed(oid)::int4 from pg_class where relname = ''''|| table_name ||'''';'') AS (a int4);' into changedTuples; EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples; EXECUTE 'show autovacuum_analyze_threshold;' into sys_anl_threshold; EXECUTE 'show autovacuum_analyze_scale_factor;' into sys_anl_scale_factor; EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_threshold'') as value from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_threshold; EXECUTE 'select (select option_value from pg_options_to_table(c.reloptions) where option_name = ''autovacuum_analyze_scale_factor'') as value from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into rel_anl_scale_factor; dbms_output.put_line('relTuples='||relTuples||'; sys_anl_threshold='||sys_anl_threshold||'; sys_anl_scale_factor='||sys_anl_scale_factor||'; rel_anl_threshold='||rel_anl_threshold||'; rel_anl_scale_factor='||rel_anl_scale_factor||';'); if rel_anl_threshold IS NOT NULL then anl_threshold = rel_anl_threshold; else anl_threshold = sys_anl_threshold; end if; if rel_anl_scale_factor IS NOT NULL then anl_scale_factor = rel_anl_scale_factor; else anl_scale_factor = sys_anl_scale_factor; end if; if changedTuples > anl_threshold + anl_scale_factor * relTuples then need_analyze := true; end if; return need_analyze; END; $$LANGUAGE 'plpgsql';
6.3 判断表是否需要analyze(自定义阈值)
-- the function for get all pg_stat_activity information in all CN of current cluster.CREATE OR REPLACE FUNCTION pg_catalog.pgxc_stat_table_need_analyze(in table_name text, int anl_threshold, float anl_scale_factor)RETURNS BOOlAS $$DECLARE relTuples int4; changedTuples int4:= 0; need_analyze bool := false; BEGIN --Get all the node names EXECUTE 'SELECT sum(a) FROM pg_catalog.pgxc_parallel_query(''cn'', ''SELECT pg_stat_get_tuples_changed(oid)::int4 from pg_class where relname = ''''|| table_name ||'''';'') AS (a int4);' into changedTuples; EXECUTE 'select pg_stat_get_live_tuples(oid) from pg_class c where c.oid = '''|| table_name ||'''::REGCLASS;' into relTuples; if changedTuples > anl_threshold + anl_scale_factor * relTuples then need_analyze := true; end if; return need_analyze; END; $$LANGUAGE 'plpgsql';
- Interviewer: what is XSS attack?
- Huawei ENSP simulator realizes communication security (switch)
- 迈动互联中标北京人寿保险
- Jerry's ad series MIDI function description [chapter]
- UTF encoding and character set in golang
- WGCNA分析基本教程总结
- Use of redis publish subscription
- __ init__ () missing 2 required positive arguments
- Liu Jincheng won the 2022 China e-commerce industry innovation Figure Award
- 吐槽 B 站收费,是怪它没钱么?
[weekly translation go] how to code in go series articles are online!!
y56.第三章 Kubernetes从入门到精通 -- 业务镜像版本升级及回滚(二九)
A quick start to fastdfs takes you three minutes to upload and download files to the ECS
Jerry added the process of turning off the touch module before turning it off [chapter]
How was MP3 born?
[public class preview]: basis and practice of video quality evaluation
TCP shakes hands three times and waves four times. Do you really understand?
Render function and virtual DOM
Minidom module writes and parses XML
Flutter 返回按钮的监听
杰理之AD 系列 MIDI 功能说明【篇】
How was MP3 born?
Difference between ApplicationContext and beanfactory (MS)
Redis bloom filter
B站视频 声音很小——解决办法
In the release version, the random white screen does not display the content after opening the shutter
Jerry's ad series MIDI function description [chapter]
Jerry's ad series MIDI function description [chapter]