当前位置:网站首页>Optimization of lazyagg query rewriting in parsing data warehouse
Optimization of lazyagg query rewriting in parsing data warehouse
2022-06-25 16:32:00 【51CTO】
Abstract : This paper deals with Lazy Agg Query rewrite optimization and GaussDB(DWS) Provided Lazy Agg Rewrite rules .
This article is shared from Huawei cloud community 《 GaussDB(DWS) lazyagg Query rewrite optimization resolution 【 Gauss is not a mathematician this time 】》, author : OreoreO .
The aggregation operation groups the query results by the values of one or more columns , A set of equal values . Aggregation is a common operation and is widely used in financial customers . For example, the following statement :
One 、Lazy Agg Rewriting rule
In the case of large amount of data , Due to the large amount of data, the footwall , The execution time of aggregation operation becomes a performance bottleneck , As a result, the whole query execution efficiency is very poor . for example :
Subquery pair t1.b Columns are aggregated , Yes t1.c Summation , In an external query , There are also aggregation operations , Aggregate sum columns for subqueries cc Summation . For such statements , When the aggregation operation of sub query is time-consuming , Query rewriting rules can be used to eliminate the aggregation of subqueries , The aggregation function of the external query uniformly completes the aggregation operation . Eliminating a subquery may result in an increase in the number of rows in the subquery , But for the sub query aggregation operation t1.b Column distinct Scenarios with high values , The number of rows after the sub query aggregation operation will not be significantly reduced compared with the original table , Will not cause the outer layer JOIN A large increase in the amount of computation . That is, the statement can be rewritten as :
This rewrite rule is called Lazy Agg, It is applicable to the large amount of base table data distinct Scenarios with high values . If there are fewer duplicate values , Then eliminating the aggregation operation will lead to Join After that, the number of lines surged ,Join Poor performance , Therefore, it is necessary to Agg Push down to Join Before , Through advance Agg Reduced operation Join The number of rows of the result , This rewrite rule is called Eager Agg.
Two 、GaussDB(DWS) lazyagg Optimize
To make tuning less difficult , Improve product ease of use ,GaussDB(DWS) Provides lazyagg Query rewrite optimization rules , Can be set by guc Parameters rewrite_rule contain ’lazyagg’ Use Lazy Agg Query rewrite optimization . Turn on lazyagg After query rewrite optimization , For the scenario that meets the conditions, the aggregation operation in the sub query will be optimized and eliminated . The original plan is as follows :

lazyagg Rewrite the optimized plan as follows :

You can see that compared with the original plan ,lazyagg After rewriting the optimization, the aggregation operation in the original plan is eliminated , namely 7 Number Subquery Scan Operator and 8 Number HashAggregate operator .
3、 ... and 、lazyagg Optimize specifications
- The sub query can be a single aggregate query or a query containing aggregate sub set operations . Collection operations only support UNION ALL, Some branch sub queries can be aggregated and eliminated . Subquery must be JOIN One of the tables ( be not in TargetList、Where Clause, etc ).
- Support all external queries Agg The parameter column is contained in the... Of one of its subqueries Agg Function column , The aggregation operation of the sub query can be eliminated .
- Support all kinds of aggregation functions with correct results after eliminating the aggregation operation of sub queries . See the following table for the correctness of aggregation function type results :

4. Scene constraint
On the basis of the above scenario expansion , For scenarios that may lead to incorrect results , No query rewriting , Including but not limited to :
- Eliminating is not supported Agg Function type .
- The subquery contains other conditions or operators , Will result in error after rewriting , for example HAVING、window agg、LIMIT、OFFSET、AP function、distinct、recursive etc. .
- Outer layer Agg Parameter column 、GROUP BY Column or JOIN Column contains volatile function , Such as random、timeofday etc. .
- Subquery Agg Out of function 、 External query Agg There are other expressions or function operations in the function , Such as sub query Agg Function column is sum+1、max+max(d), External query Agg Function column is sum(cc+1) etc. .
- For external queries JOIN Column 、GROUP BY Columns or other conditions contain subqueries Agg Function column .
- Subquery in LEFT JOIN、RIGHT JOIN Of inner Edge or FULL JOIN in , And subquery Agg Function is count, External query Agg Function is sum Of .
Four 、 Conclusion
Through the analysis of this paper , I believe the user friends have fully understood Lazy Agg Rewrite optimized usage scenarios , as well as GaussDB(DWS) Of lazyagg Realization way . I hope that the majority of users can have an in-depth understanding of , Yes GaussDB(DWS) Have a strong interest in and deeply participate in the performance tuning of .
Reference documents : GaussDB(DWS) Performance Tuning Series 4 : One of the eighteen martial arts SQL rewrite
Theory is not as good as practice , How to experience it quickly DWS Well ?DWS Now we have launched a Demo Experience activities . Get into DWS home page , Click on “Demo Experience ”, A quick and convenient experience !( Any suggestions and comments during the experience , You can go to DWS Community BBS Feedback oh )
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- What is backbone network
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection翻译
- Day_ 04
- [problem solving] dialogfragment can not be attached to a container view
- Mt60b1g16hc-48b:a micron memory particles FBGA code d8bnk[easy to understand]
- File operation, serialization, recursive copy
- Stop "outsourcing" Ai models! The latest research finds that some "back doors" that undermine the security of machine learning models cannot be detected
- Helsinki traffic safety improvement project deploys velodyne lidar Intelligent Infrastructure Solution
- AD域登录验证
- JS add custom attributes to elements
猜你喜欢

GO语言-什么是临界资源安全问题?

绕过技术聊'跨端'......

炮打司令部,别让一个UI框架把你毁了

【效率】又一款笔记神器开源了!

【蓝桥杯集训100题】scratch指令移动 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第14题

Day_ 18 hash table, generic

Blue Bridge Cup - practice system login

Resolve the format conflict between formatted document and eslint

Navicat Premium 15 for Mac(数据库开发工具)中文版

Educational administration system development (php+mysql)
随机推荐
加密潮流:时尚向元宇宙的进阶
赫尔辛基交通安全改善项目部署Velodyne Lidar智能基础设施解决方案
The database records are read through the system time under the Android system, causing the problem of incomplete Reading Records!
The textfield is encapsulated by the flutter itself, which causes the data display to be disordered when the data in the list is updated.
心楼:华为运动健康的七年筑造之旅
What are some tricks that novice programmers don't know?
Day_ thirteen
WPF开发随笔收录-心电图曲线绘制
Rxjs TakeUntil 操作符的学习笔记
Based on neural tag search, the multilingual abstracts of zero samples of Chinese Academy of Sciences and Microsoft Asiatic research were selected into ACL 2022
About the use of Aidl, complex data transmission
Converting cifar10 datasets
Alvaria announces Jeff cotten, a veteran of the customer experience industry, as its new CEO
cmd。。。。。。
Record learning of hystrix knowledge --20210929
Bugly hot update usage
什么是骨干网
GO语言-什么是临界资源安全问题?
一个 TDD 示例
Day_ twelve