当前位置:网站首页>SQL tuning guide notes 14:managing extended statistics
SQL tuning guide notes 14:managing extended statistics
2022-06-12 21:41:00 【dingdingfish】
This paper is about SQL Tuning Guide The first 14 Chapter “Managing Extended Statistics” The notes .
Important basic concepts
- column group statistics
Extended statistics gathered on a group of columns treated as a unit.
Extended statistics collected on a set of columns processed as a unit .
DBMS_STATS Enables you to collect extended Statistics , These statistics can improve cardinality estimation when there are multiple predicates on different columns of the table or predicates use expressions .
An extension is a rank or expression . When multiple columns in the same table are in SQL When both statements appear , Rank statistics can improve cardinality estimation . When predicates use expressions ( for example , Built in or user-defined functions ) when , Expression statistics improve the optimizer's estimates .
Be careful : You cannot create extended statistics on a virtual column .
14.1 Managing Column Group Statistics
A column group is a group of columns that are treated as a unit .
Essentially , A rank is a virtual column . By collecting statistics of rank , The optimizer can more accurately determine the cardinality estimate when the query combines these columns .
The following section provides an overview of rank statistics , And how to manage them manually .
14.1.1 About Statistics on Column Groups
Single column statistics are useful for determining WHERE The selectivity of a single predicate in a clause is useful .
When WHERE Clause contains multiple predicates on different columns of the same table , The statistics for each column do not show the relationship between columns . This is the problem solved by ranks .
The optimizer independently evaluates the selectivity of predicates , Then combine them . however , If there is a correlation between the columns , The optimizer cannot take it into account when determining the cardinality estimate , It creates cardinality estimates by multiplying the selectivity of each table predicate by the number of rows .
The figure below compares in sh.customers Tabular cust_state_province and country_id There are two ways to collect statistics on a column . The picture shows DBMS_STATS Collect statistics for each column and group separately . The rank has a system generated name .
Be careful : The optimizer uses rank statistics for equality predicates 、inlist Predicates and estimates GROUP BY base .
14.1.1.1 Why Column Group Statistics Are Needed: Example
This example demonstrates how rank statistics enable the optimizer to provide more accurate cardinality estimates .
DBA_TAB_COL_STATISTICS The following query of the table shows the information about sh.customers In the table cust_state_province and country_id Column collected statistics :
COL COLUMN_NAME FORMAT a20
COL NDV FORMAT 999
SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SH'
AND TABLE_NAME = 'CUSTOMERS'
AND COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');
OLUMN_NAME NDV HISTOGRAM
-------------------- ---- ---------------
CUST_STATE_PROVINCE 145 FREQUENCY
COUNTRY_ID 19 FREQUENCY
-- 3341 Clients live in California :
SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA';
COUNT(*)
----------
3341
EXPLAIN PLAN FOR
SELECT *
FROM sh.customers
WHERE cust_state_province = 'CA'
AND country_id=52790;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1115 | 205K| 445 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 1115 | 205K| 445 (1)| 00:00:01 |
-------------------------------------------------------------------------------
according to country_id and cust_state_province Single column statistics of columns , The optimizer estimates that a query from a California customer will return 1115 That's ok . in fact , Yes 3341 Clients live in California , However, the optimizer does not know that California is located in the country where the United States is located / region , Therefore, the cardinality is greatly underestimated by assuming that both predicates reduce the number of rows returned .
You can make the optimizer understand by collecting rank statistics country_id and cust_state_province The actual relationship between the values in . These statistics enable the optimizer to give more accurate cardinality estimates .
14.1.1.2 Automatic and Manual Column Group Statistics
Oracle The database can automatically or manually create rank statistics .
The optimizer can use SQL Planning instructions to generate more optimized plans . If DBMS_STATS Preferences AUTO_STAT_EXTENSIONS Set to ON( The default is OFF), be SQL Scheduling instructions can automatically trigger the creation of rank statistics based on the usage of predicates in the workload . You can use SET_TABLE_PREFS、SET_GLOBAL_PREFS or SET_SCHEMA_PREFS Process settings AUTO_STAT_EXTENSIONS.
When you want to manually manage rank statistics , Please use DBMS_STATS, As shown below :
- Detect rank
- Create previously detected ranks
- Manually create a rank and collect rank statistics
14.1.1.3 User Interface for Column Group Statistics
How many? DBMS_STATS Program units have preferences related to ranks .
surface 14-1 Related to rank DBMS_STATS API:
Program unit or preference | describe |
---|---|
SEED_COL_USAGE The process | Iterating through the specified workload SQL sentence , Compile them , Then seed the column usage information for the columns that appear in these statements . To determine the appropriate rank , The database must observe a representative workload . You do not have to run your own queries during monitoring . contrary , You can run against some of the longer running queries in the workload EXPLAIN PLAN, To ensure that the database is recording rank information for these queries . |
REPORT_COL_USAGE function | Generate a report , List the filter predicates in the workload 、 Join predicates and GROUP BY Columns seen in clause . You can use this function to view the column usage information recorded for a specific table . |
CREATE_EXTENDED_STATS function | Create extensions , They can be ranks or expressions . When the user generates or automatically collects the statistical information of the job collection table , The database collects extended Statistics . |
AUTO_STAT_EXTENSIONS Preferences | When collecting optimizer Statistics , Controls the automatic creation of extensions , Include rank . Use SET_TABLE_PREFS、SET_SCHEMA_PREFS or SET_GLOBAL_PREFS Set this preference . When AUTO_STAT_EXTENSIONS Set to OFF( Default ) when , The database does not automatically create rank statistics . To create an extension , You must execute CREATE_EXTENDED_STATS Function or in DBMS_STATS API Of METHOD_OPT The extended statistics are explicitly specified in the parameter . When set to ON when ,SQL Scheduling instructions can automatically trigger the creation of rank statistics based on the usage of columns in predicates in the workload . |
14.1.2 Detecting Useful Column Groups for a Specific Workload
You can use DBMS_STATS.SEED_COL_USAGE and REPORT_COL_USAGE Determine which ranks the table needs based on the specified workload .
When you don't know which extension statistics to create , This technique is very useful . This technique is not applicable to expression statistics .
This tutorial assumes the following :
- For using reference columns country_id and cust_state_province Predicate of sh.customers_test surface ( from customers Table creation ) Query for , The base estimate is incorrect .
- You want the database to monitor your workload 5 minute (300 second ).
- You want the database to automatically determine which ranks are required .
-- Create test table , Be careful user Usage of
DROP TABLE customers_test;
CREATE TABLE customers_test AS SELECT * FROM customers;
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'customers_test');
-- Enable workload monitoring
exec DBMS_STATS.SEED_COL_USAGE(null,null,300);
-- stay customers_test The table shows the interpretation plan for the two queries
EXPLAIN PLAN FOR
SELECT *
FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
----------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | TABLE ACCESS FULL| CUSTOMERS_TEST | 1 |
----------------------------------------------------
EXPLAIN PLAN FOR
SELECT country_id, cust_state_province, count(cust_city)
FROM customers_test
GROUP BY country_id, cust_state_province;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1949 |
| 1 | HASH GROUP BY | | 1949 |
| 2 | TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------
The first plan shows the return 932 The cardinality of the row query is 1 That's ok . The second plan shows the return 145 The cardinality of the row query is 1949 That's ok .
View the column usage information recorded for the table , You can see that all the column usage information is recorded .
SET LONG 100000
SET LINES 120
SET PAGES 0
SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'customers_test')
FROM DUAL;
LEGEND:
.......
EQ : Used in single table EQuality predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE predicate
NULL : Used in single table is (not) NULL predicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
...............................................................................
###############################################################################
COLUMN USAGE REPORT FOR SH.CUSTOMERS_TEST
.........................................
1. COUNTRY_ID : EQ
2. CUST_CITY : EQ
3. CUST_STATE_PROVINCE : EQ
4. (CUST_CITY, CUST_STATE_PROVINCE,
COUNTRY_ID) : FILTER
5. (CUST_STATE_PROVINCE, COUNTRY_ID) : GROUP_BY
###############################################################################
All three columns appear in the same WHERE clause , So the report shows them as group filters . In the second query ,GROUP BY Two columns appear in the clause , So the report marks them as GROUP_BY. FILTER and GROUP_BY The column set in the report is a candidate for a rank .
14.1.3 Creating Column Groups Detected During Workload Monitoring
You can use DBMS_STATS.CREATE_EXTENDED_STATS Function to create a previous DBMS_STATS.SEED_COL_USAGE Detected rank .
-- Based on the usage information captured during the monitoring window , by customers_test Table create rank .
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(user, 'customers_test') FROM DUAL;
###############################################################################
EXTENSIONS FOR SH.CUSTOMERS_TEST
................................
1. (CUST_CITY, CUST_STATE_PROVINCE,
COUNTRY_ID) : SYS_STUMZ$C3AIHLPBROI#SKA58H_N created
2. (CUST_STATE_PROVINCE, COUNTRY_ID) : SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ created
###############################################################################
-- The database for customers_test Created two ranks : A rank is used to filter predicates , One set for GROUP BY operation .
-- Collect table statistics again .
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'customers_test');
-- Check USER_TAB_COL_STATISTICS View to determine what additional statistics the database has created , Note that the last word is SYS_ The first two lines :
COL COLUMN_NAME FOR A30
SET PAGES 9999
SELECT COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'CUSTOMERS_TEST'
ORDER BY 1;
COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_ID 19 FREQUENCY
CUST_CITY 620 HYBRID
CUST_CITY_ID 620 NONE
CUST_CREDIT_LIMIT 8 NONE
CUST_EFF_FROM 1 NONE
CUST_EFF_TO 0 NONE
CUST_EMAIL 1699 NONE
CUST_FIRST_NAME 1300 NONE
CUST_GENDER 2 NONE
CUST_ID 55500 NONE
CUST_INCOME_LEVEL 12 NONE
CUST_LAST_NAME 908 NONE
CUST_MAIN_PHONE_NUMBER 51344 NONE
CUST_MARITAL_STATUS 11 NONE
CUST_POSTAL_CODE 623 NONE
CUST_SRC_ID 0 NONE
CUST_STATE_PROVINCE 145 FREQUENCY
CUST_STATE_PROVINCE_ID 145 NONE
CUST_STREET_ADDRESS 49900 NONE
CUST_TOTAL 1 NONE
CUST_TOTAL_ID 1 NONE
CUST_VALID 2 NONE
CUST_YEAR_OF_BIRTH 75 NONE
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ 145 NONE
SYS_STUMZ$C3AIHLPBROI#SKA58H_N 620 HYBRID
25 rows selected.
Explain the plan again , The new plan shows a more accurate base estimate ( In the first plan ,1032 Very close to 945, And the second plan is completely accurate ).
EXPLAIN PLAN FOR
SELECT *
FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
----------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1013 |
| 1 | TABLE ACCESS FULL| CUSTOMERS_TEST | 1013 |
----------------------------------------------------
EXPLAIN PLAN FOR
SELECT country_id, cust_state_province, count(cust_city)
FROM customers_test
GROUP BY country_id, cust_state_province;
SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 145 |
| 1 | HASH GROUP BY | | 145 |
| 2 | TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------
clear :
drop table customers_test;
14.1.4 Creating and Gathering Statistics on Column Groups Manually
In some cases , You may already know the rank you want to create .
DBMS_STATS.GATHER_TABLE_STATS Functional METHOD_OPT Parameter can automatically create and collect statistics of rank . You can use the FOR COLUMNS Specify a rank to create a new rank .
This tutorial assumes the following :
- Would you like to serve sh In the customer table in the schema cust_state_province and country_id Column create a column group .
- You want to collect statistics about the entire table and the new rank ( Including histograms ).
BEGIN
DBMS_STATS.GATHER_TABLE_STATS( 'sh','customers',
METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY ' ||
'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );
END;
/
14.1.5 Displaying Column Group Information
To get the name of the rank , Please use DBMS_STATS.SHOW_EXTENDED_STATS_NAME Function or database view .
You can also use views to get information such as the number of different values and whether a column group has a histogram .
This tutorial assumes the following :
- You are in sh The customer table of the schema is cust_state_province and country_id Column creates a rank .
- You want to determine the rank name 、 The number of different values and whether a histogram has been created for the rank .
SELECT
sys.dbms_stats.show_extended_stats_name('sh', 'customers', '(cust_state_province,country_id)') col_group_name
FROM
dual;
COL_GROUP_NAME
-------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_
COL EXTENSION_NAME FOR A40
SELECT EXTENSION_NAME, EXTENSION
FROM USER_STAT_EXTENSIONS
WHERE TABLE_NAME='CUSTOMERS';
EXTENSION_NAME EXTENSION
---------------------------------------- -----------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE","COUNTRY_ID")
SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS';
COL_GROUP NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------
("COUNTRY_ID","CUST_STATE_PROVINCE") 145 FREQUENCY
14.1.6 Dropping a Column Group
Use DBMS_STATS.DROP_EXTENDED_STATS Function to delete a column group from a table .
This tutorial assumes the following :
- You are in sh The customer table of the schema is cust_state_province and country_id Column creates a rank .
- You want to delete the rank .
EXEC DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 'customers', '(cust_state_province, country_id)' );
14.2 Managing Expression Statistics
When WHERE Clause has predicates that use expressions , An extended statistical type called expression statistics can improve the optimizer's estimates .
14.2.1 About Expression Statistics
For application to WHERE Clause column (function(col)=constant) Expression of form , The optimizer does not know how this function affects predicate cardinality , Unless there is a function based index . however , You can collect information about expression(function(col) Its own expression statistics .
The following figure shows the optimizer using statistics to generate a plan for a query that uses functions . Statistics for optimizer check columns are displayed at the top . The bottom shows the optimizer checking the statistics corresponding to the expressions used in the query . Expression statistics produce more accurate estimates .
As shown in the figure above , When expression statistics are not available , The optimizer can generate sub optimal plans .
14.2.1.1 When Expression Statistics Are Useful: Example
SELECT COUNT(*) FROM sh.customers WHERE cust_state_province='CA';
COUNT(*)
----------
3341
EXPLAIN PLAN FOR
SELECT * FROM sh.customers WHERE LOWER(cust_state_province)='ca';
SELECT * FROM TABLE(dbms_xplan.display);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 555 | 108K| 445 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 555 | 108K| 445 (1)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(LOWER("CUST_STATE_PROVINCE")='ca')
because LOWER(cust_state_province)=‘ca’ No expression statistics exist , So the optimizer estimates deviate significantly . You can use DBMS_STATS Process to correct these estimates .
14.2.2 Creating Expression Statistics
You can use DBMS_STATS Create statistics for user specified expressions .
You can use any of the following program units :
- GATHER_TABLE_STATS The process
- CREATE_EXTENDED_STATISTICS Function followed by GATHER_TABLE_STATS The process
This tutorial assumes the following :
- For the use of UPPER(cust_state_province) Functional sh.customers Inquire about , Selective estimation is not accurate .
- You want to collect information about UPPER(cust_state_province) Statistics of expressions .
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(
'sh'
, 'customers'
, method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY ' ||
'FOR COLUMNS (LOWER(cust_state_province)) SIZE SKEWONLY'
);
END;
14.2.3 Displaying Expression Statistics
To get information about expression Statistics , Please use the database view DBA_STAT_EXTENSIONS and DBMS_STATS.SHOW_EXTENDED_STATS_NAME function .
You can also use views to get information such as the number of different values and whether a column group has a histogram .
This tutorial assumes the following :
- You are LOWER(cust_state_province) Expression creates extended Statistics .
- You want to determine the rank name 、 The number of different values and whether a histogram has been created for the rank .
To monitor expression Statistics :
COL EXTENSION_NAME FORMAT a30
COL EXTENSION FORMAT a35
SELECT EXTENSION_NAME, EXTENSION
FROM USER_STAT_EXTENSIONS
WHERE TABLE_NAME='CUSTOMERS';
EXTENSION_NAME EXTENSION
------------------------------ -----------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE (LOWER("CUST_STATE_PROVINCE"))
SELECT e.EXTENSION expression, t.NUM_DISTINCT, t.HISTOGRAM
FROM USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS';
EXPRESSION NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------
(LOWER("CUST_STATE_PROVINCE")) 145 FREQUENCY
14.2.4 Dropping Expression Statistics
To delete a rank from a table , Please use DBMS_STATS.DROP_EXTENDED_STATS function .
This tutorial assumes the following :
- You are LOWER(cust_state_province) Expression creates extended Statistics .
- You want to delete the expression Statistics .
BEGIN
DBMS_STATS.DROP_EXTENDED_STATS(
'sh'
, 'customers'
, '(LOWER(cust_state_province))'
);
END;
/
边栏推荐
- 如何自己动手写一个vscode插件,实现插件自由!
- Select sort
- Redis cluster mget optimization
- Insert sort
- Jdbctemplate inserts and returns the primary key
- Can tonghuashun open an account? Can the security of securities companies be directly opened on the app? How to open an account for securities accounts
- Exception encountered by selenium operation element
- 六月集训(第10天) —— 位运算
- GNS installation and configuration
- SQL tuning guide notes 9:joins
猜你喜欢
SQL调优指南笔记9:Joins
SQL tuning guide notes 8:optimizer access paths
Experiment 7-2-6 print Yanghui triangle (20 points)
如何自己动手写一个vscode插件,实现插件自由!
NiO User Guide
Turing prize winner: what should I pay attention to if I want to succeed in my academic career?
Cookies and sessions
SQL调优指南笔记14:Managing Extended Statistics
“Oracle数据库并行执行”技术白皮书读书笔记
Oracle livelabs experiment: introduction to Oracle Spatial
随机推荐
Recursively call knowledge points - including example solving binary search, frog jumping steps, reverse order output, factorial, Fibonacci, Hanoi tower.
SQL调优指南笔记6:Explaining and Displaying Execution Plans
Pointer and array & pointer and const & struct and Const
zgc 并发标识和并发转移阶段的多视图地址映射
Teamwork collaboration application experience sharing | community essay solicitation
大一下学年学期总结
The service did not report any errors MySQL
logstash时间戳转换为unix 纳秒nano second time
Graphics2D类基本使用
ICML2022 | GALAXY:極化圖主動學習
VagrantBox重新安装vboxsf驱动
SQL调优指南笔记16:Managing Historical Optimizer Statistics
Vagrantbox reinstalling the vboxsf driver
风控建模十:传统建模方法存在的问题探讨及改进方法探索
ZGC concurrent identity and multi view address mapping in concurrent transition phase
C language learning notes (II)
Select sort
Solution of good number pairs
zgc的垃圾收集的主要阶段
服务没有报告任何错误mysql