当前位置:网站首页>Cloud native database query optimization - statistics and row count estimation
Cloud native database query optimization - statistics and row count estimation
2022-06-29 21:41:00 【Gauss squirrel Club】
Catalog
SQL The engine performs query mainly through lexical and grammatical parsing 、 Query rewriting 、 Query Planning and plan execution . among , In the process of Query Planning , To generate an executable optimal plan , First, generate the path , And because of the diversity of paths , Therefore, the path needs to be eliminated . At present, the path selection of the optimizer is mainly based on the estimated cost , So this kind of optimizer is also called cost based optimizer (Cost Based Optimization, CBO). Relative to logical optimization , This optimization method is physical optimization : According to the distribution of data ( Statistics ) To evaluate the query execution path , Select a path with the least execution cost from the optional paths to execute , For example, whether to select an index SeqScan vs. IndexScan, Choose which index , What kind of connection order is selected for the association of two tables , What specific algorithm to choose .
When estimating the cost , The number of rows that need to use the base table or the join table , And a lot of the time , The optimizer cannot get the exact row value , Therefore, the number of rows needs to be estimated (Cardinality Estimation), Then calculate the cost .
Statistics
Statistics are the basis for physical optimization , Statistics from table information . The characteristics describing the base table data include unique values 、MCV(Most Common Value) It's worth waiting for , For line count estimation .
Table-Level Table level statistics , Stored in the system table pg_class.
relptuples Total tuples : Describes the number of tuples corresponding to the table .
relpages Total pages : The number of disk pages corresponding to the description table .
Column-Level Column level statistics , Stored in the system table pg_statistics, You can also use views pg_stats View the data .
Starelid: Tabular oid.
Staattnum: Table attribute number .
stadistinct: It is used to describe the only non - in the field NULL Number of data values , It is generally used to estimate the size of a set after grouping ,Join Result set size .
stanullfrac: Used to describe... In the current column NULL The percentage of value in the total .
Attribute group {stakind1, stanumbers1, stavalues1} constitute PG_STATISTIC A card slot in the watch , stay PG_STATISTIC Table has 5 Slots . In general , The first card slot stores MCV(Most Common Value) Information : Describes a set of values that occur more frequently than a certain percentage , Sort according to the frequency of occurrence , Usually used to indicate which values are skewed . The second card slot stores Histogram Histogram information , Describe except NULL value 、MCV The distribution of values other than , It is generally used to estimate the selection rate .
With MCV Card slot as an example attribute “stakind1” The type of identification card slot is MCV, among “1” by “STATISTIC_KIND_MCV” The enumerated values ; attribute stanumbers1 And properties stavalues1 Record MCV Specific content of , among stavalues1 Record key value ,stanumbers1 Record key Corresponding frequency .
The system tables pg_statistics The definition of is in the file pg_statistic.h in .
#define STATISTIC_KIND_MCV 1
#define STATISTIC_KIND_HISTOGRAM 2
#define STATISTIC_KIND_CORRELATION 3
#define STATISTIC_KIND_MCELEM 4
#define STATISTIC_KIND_DECHIST 5Statistics are provided through analyze Command to get .



surface tt Of oid by 40960, Yes 10000 Row data occupied 345 individual pages page . The first 1 Column unique1 The distribution of can be obtained from histogram information , Histogram has 100 Intervals , And there are no null values and MCV. The first 16 Column string4 The distribution of can be determined by MCV information acquisition , This column has 4 individual distinct value ”AAAAxx” ,”HHHHxx” , “OOOOxx” , “VVVVxx” ,4 The distribution frequency of each value has 0.25.
Row count estimation
Line count estimation is the basis of cost estimation , Extrapolation from base table statistics , Estimate base table baserel、Join Intermediate result set joinrel、Aggregation Result set size in , Prepare for cost estimation .
SQL Queries often have where constraint ( Filter conditions ), such as SELECT * FROM tt WHERE string4 = 'AAAAxx'. Knowing the selection rate of constraints , That is, we know the proportion of the results to be scanned through the scanning path or the proportion of tuples obtained through the connection operation , From this ratio, we can calculate the number of intermediate results and final results , These quantities are then used to calculate the cost .
Here we focus on the simple query of the base table —— be based on OpExpr Type selection rate calculation , The processing function is in the clause_selectivity. If it is a filter condition, call restriction_selectivity Function to get OpExpr Selection rate of expression , If it is a connection condition, call join_selectivity Function to get the selection rate .
SELECT * FROM tt WHERE string4 = 'AAAAxx' For filter conditions , call restriction_selectivity Estimate the selection rate .



restriction_selectivity The function recognizes string4 = 'AAAAxx' Is shaped like Var = Const Equivalence constraint , The constraint selective evaluation function of the operator is stored in the system table PG_OPERATOR,opno = 93 The corresponding selection rate calculation function is eqsel, adopt eqsel Function call var_eq_const Function to estimate the selection rate . In the process ,var_eq_const The function reads PG_STATISTIC In the table string4 Column distribution information , And make use of MCV The selection rate for direct return of information is 0.25.

function set_baserel_size_estimates Calculate the estimated number of rows .

Function call relationship :standard_planner-> subquery_planner-> grouping_planner-> query_planner-> make_one_rel-> set_base_rel_sizes-> set_rel_size-> set_plain_rel_size-> set_baserel_size_estimates-> clauselist_selectivity-> clause_selectivity-> restriction_selectivity-> OidFunctionCall4Coll-> eqsel->var_eq_const
边栏推荐
- Basic qualities of management personnel
- STM32 minimum system construction (schematic diagram)
- 推荐书籍--白夜行
- Writing a makefile for a golang project
- [cloud native practice] kubesphere practice - Multi tenant system practice
- [fishing artifact] code tool for lowering the seconds of UI Library -- form part (I) design
- 亚马逊关键词搜索API接口(item_search-按关键字搜索亚马逊商品接口),亚马逊API接口
- I want to register my stock account online. How do I do it? In addition, is it safe to open a mobile account?
- Desai wisdom number - other charts (basic sunrise chart): high frequency words in graduation speech
- 唯品会关键词搜索API接口(item_search-按关键字搜索唯品会商品API接口),唯品会API接口
猜你喜欢

Reinforcement learning weekly (issue 51): integration of PAC, ilql, RRL & model free reinforcement learning into micro grid control: overview and Enlightenment

每周招聘|DBA数据工程师,年薪35+ ,梦起九州,星河灿烂!

verilog实现DDS波形发生器模块,可实现频率、相位可调,三种波形

Alibaba cloud released the atlas of China's robot industry (2022), 122 Pages pdf

双目立体视觉摄像头的标定、矫正、世界坐标计算(opencv)

Desai wisdom number - other charts (basic sunrise chart): high frequency words in graduation speech

知识蒸馏(Knowledge Distilling)学习笔记

About Effect Size
![[advanced ROS chapter] Lecture 4: duplicate names in ROS (nodes, topics and parameters)](/img/25/85e8c55605f5cc999a8e85f0a05f93.jpg)
[advanced ROS chapter] Lecture 4: duplicate names in ROS (nodes, topics and parameters)

Golang operation NSQ distributed message queue
随机推荐
STL教程6-deque、stack、queue、list容器
Shell implementation of Memcache cache cache hit rate monitoring script
知识蒸馏(Knowledge Distilling)学习笔记
Design of VHDL telephone billing system
Shutter bottomnavigationbar toggle page hold
Desai wisdom number - other charts (basic sunrise chart): high frequency words in graduation speech
Verilog implements DDS waveform generator module, which can realize adjustable frequency and phase, three waveforms
亚马逊商品详情API接口-(item_get-获得AMAZON商品详情接口),亚马逊详情API接口
Shutter bottomnavigationbar with page switching example
亚马逊关键词搜索API接口(item_search-按关键字搜索亚马逊商品接口),亚马逊API接口
直播预告 | PostgreSQL 内核解读系列第一讲:PostgreSQL 系统概述
Amazon Keyword Search API interface (item_search- Amazon product search interface by keyword), Amazon API interface
Sophon CE community edition goes online, and free get is a lightweight, easy-to-use, efficient and intelligent data analysis tool
What problems should be avoided when using the points mall games for marketing?
CORDIC based Signal Processor desgn
Simple application and configuration of Rsync
Knowledge distilling learning notes
LeetCode 1. Sum of two numbers
[advanced ROS chapter] Lecture 4: duplicate names in ROS (nodes, topics and parameters)
Alibaba keyword search commodity API interface (item_search- commodity search interface by keyword), Alibaba Search API interface