当前位置:网站首页>SQL tuning guide notes 14:managing extended statistics

SQL tuning guide notes 14:managing extended statistics

2022-06-12 21:41:00 dingdingfish

This paper is about SQL Tuning Guide The first 14 Chapter “Managing Extended Statistics” The notes .

Important basic concepts

  • column group statistics
    Extended statistics gathered on a group of columns treated as a unit.
    Extended statistics collected on a set of columns processed as a unit .

DBMS_STATS Enables you to collect extended Statistics , These statistics can improve cardinality estimation when there are multiple predicates on different columns of the table or predicates use expressions .

An extension is a rank or expression . When multiple columns in the same table are in SQL When both statements appear , Rank statistics can improve cardinality estimation . When predicates use expressions ( for example , Built in or user-defined functions ) when , Expression statistics improve the optimizer's estimates .

Be careful : You cannot create extended statistics on a virtual column .

14.1 Managing Column Group Statistics

A column group is a group of columns that are treated as a unit .

Essentially , A rank is a virtual column . By collecting statistics of rank , The optimizer can more accurately determine the cardinality estimate when the query combines these columns .

The following section provides an overview of rank statistics , And how to manage them manually .

14.1.1 About Statistics on Column Groups

Single column statistics are useful for determining WHERE The selectivity of a single predicate in a clause is useful .

When WHERE Clause contains multiple predicates on different columns of the same table , The statistics for each column do not show the relationship between columns . This is the problem solved by ranks .

The optimizer independently evaluates the selectivity of predicates , Then combine them . however , If there is a correlation between the columns , The optimizer cannot take it into account when determining the cardinality estimate , It creates cardinality estimates by multiplying the selectivity of each table predicate by the number of rows .

The figure below compares in sh.customers Tabular cust_state_province and country_id There are two ways to collect statistics on a column . The picture shows DBMS_STATS Collect statistics for each column and group separately . The rank has a system generated name .

 Insert picture description here
Be careful : The optimizer uses rank statistics for equality predicates 、inlist Predicates and estimates GROUP BY base .

14.1.1.1 Why Column Group Statistics Are Needed: Example

This example demonstrates how rank statistics enable the optimizer to provide more accurate cardinality estimates .

DBA_TAB_COL_STATISTICS The following query of the table shows the information about sh.customers In the table cust_state_province and country_id Column collected statistics :

COL COLUMN_NAME FORMAT a20
COL NDV FORMAT 999

SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM   DBA_TAB_COL_STATISTICS
WHERE  OWNER = 'SH'
AND    TABLE_NAME = 'CUSTOMERS'
AND    COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');

OLUMN_NAME           NDV HISTOGRAM      
-------------------- ---- ---------------
CUST_STATE_PROVINCE   145 FREQUENCY      
COUNTRY_ID             19 FREQUENCY 

-- 3341  Clients live in California :
SELECT COUNT(*)
FROM   sh.customers 
WHERE  cust_state_province = 'CA';

 COUNT(*)
----------
    3341

EXPLAIN PLAN FOR
  SELECT *
  FROM   sh.customers
  WHERE  cust_state_province = 'CA'
  AND    country_id=52790;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  1115 |   205K|   445   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |  1115 |   205K|   445   (1)| 00:00:01 |
-------------------------------------------------------------------------------

according to country_id and cust_state_province Single column statistics of columns , The optimizer estimates that a query from a California customer will return 1115 That's ok . in fact , Yes 3341 Clients live in California , However, the optimizer does not know that California is located in the country where the United States is located / region , Therefore, the cardinality is greatly underestimated by assuming that both predicates reduce the number of rows returned .

You can make the optimizer understand by collecting rank statistics country_id and cust_state_province The actual relationship between the values in . These statistics enable the optimizer to give more accurate cardinality estimates .

14.1.1.2 Automatic and Manual Column Group Statistics

Oracle The database can automatically or manually create rank statistics .

The optimizer can use SQL Planning instructions to generate more optimized plans . If DBMS_STATS Preferences AUTO_STAT_EXTENSIONS Set to ON( The default is OFF), be SQL Scheduling instructions can automatically trigger the creation of rank statistics based on the usage of predicates in the workload . You can use SET_TABLE_PREFS、SET_GLOBAL_PREFS or SET_SCHEMA_PREFS Process settings AUTO_STAT_EXTENSIONS.

When you want to manually manage rank statistics , Please use DBMS_STATS, As shown below :

  • Detect rank
  • Create previously detected ranks
  • Manually create a rank and collect rank statistics

14.1.1.3 User Interface for Column Group Statistics

How many? DBMS_STATS Program units have preferences related to ranks .

surface 14-1 Related to rank DBMS_STATS API:

Program unit or preference describe
SEED_COL_USAGE The process Iterating through the specified workload SQL sentence , Compile them , Then seed the column usage information for the columns that appear in these statements .
To determine the appropriate rank , The database must observe a representative workload . You do not have to run your own queries during monitoring . contrary , You can run against some of the longer running queries in the workload EXPLAIN PLAN, To ensure that the database is recording rank information for these queries .
REPORT_COL_USAGE function Generate a report , List the filter predicates in the workload 、 Join predicates and GROUP BY Columns seen in clause .
You can use this function to view the column usage information recorded for a specific table .
CREATE_EXTENDED_STATS function Create extensions , They can be ranks or expressions . When the user generates or automatically collects the statistical information of the job collection table , The database collects extended Statistics .
AUTO_STAT_EXTENSIONS Preferences When collecting optimizer Statistics , Controls the automatic creation of extensions , Include rank . Use SET_TABLE_PREFS、SET_SCHEMA_PREFS or SET_GLOBAL_PREFS Set this preference .
When AUTO_STAT_EXTENSIONS Set to OFF( Default ) when , The database does not automatically create rank statistics . To create an extension , You must execute CREATE_EXTENDED_STATS Function or in DBMS_STATS API Of METHOD_OPT The extended statistics are explicitly specified in the parameter .
When set to ON when ,SQL Scheduling instructions can automatically trigger the creation of rank statistics based on the usage of columns in predicates in the workload .

14.1.2 Detecting Useful Column Groups for a Specific Workload

You can use DBMS_STATS.SEED_COL_USAGE and REPORT_COL_USAGE Determine which ranks the table needs based on the specified workload .

When you don't know which extension statistics to create , This technique is very useful . This technique is not applicable to expression statistics .

This tutorial assumes the following :

  • For using reference columns country_id and cust_state_province Predicate of sh.customers_test surface ( from customers Table creation ) Query for , The base estimate is incorrect .
  • You want the database to monitor your workload 5 minute (300 second ).
  • You want the database to automatically determine which ranks are required .
--  Create test table , Be careful user Usage of 
DROP TABLE customers_test;
CREATE TABLE customers_test AS SELECT * FROM customers;
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'customers_test');

--  Enable workload monitoring 
exec DBMS_STATS.SEED_COL_USAGE(null,null,300);

--  stay customers_test  The table shows the interpretation plan for the two queries 
EXPLAIN PLAN FOR
  SELECT *
  FROM   customers_test
  WHERE  cust_city = 'Los Angeles'
  AND    cust_state_province = 'CA'
  AND    country_id = 52790;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));
 
----------------------------------------------------
| Id  | Operation         | Name           | Rows  |
----------------------------------------------------
|   0 | SELECT STATEMENT  |                |     1 |
|   1 |  TABLE ACCESS FULL| CUSTOMERS_TEST |     1 |
----------------------------------------------------

EXPLAIN PLAN FOR
  SELECT   country_id, cust_state_province, count(cust_city)
  FROM     customers_test
  GROUP BY country_id, cust_state_province;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

-----------------------------------------------------
| Id  | Operation          | Name           | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT   |                |  1949 |
|   1 |  HASH GROUP BY     |                |  1949 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------

The first plan shows the return 932 The cardinality of the row query is 1 That's ok . The second plan shows the return 145 The cardinality of the row query is 1949 That's ok .

View the column usage information recorded for the table , You can see that all the column usage information is recorded .

SET LONG 100000
SET LINES 120
SET PAGES 0
SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'customers_test')
FROM   DUAL;

LEGEND:
.......

EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate
GROUP_BY   : Used in GROUP BY expression
...............................................................................

###############################################################################

COLUMN USAGE REPORT FOR SH.CUSTOMERS_TEST
.........................................

1. COUNTRY_ID                          : EQ                                    
2. CUST_CITY                           : EQ                                    
3. CUST_STATE_PROVINCE                 : EQ                                    
4. (CUST_CITY, CUST_STATE_PROVINCE, 
    COUNTRY_ID)                        : FILTER 
5. (CUST_STATE_PROVINCE, COUNTRY_ID)   : GROUP_BY 
###############################################################################

All three columns appear in the same WHERE clause , So the report shows them as group filters . In the second query ,GROUP BY Two columns appear in the clause , So the report marks them as GROUP_BY. FILTER and GROUP_BY The column set in the report is a candidate for a rank .

14.1.3 Creating Column Groups Detected During Workload Monitoring

You can use DBMS_STATS.CREATE_EXTENDED_STATS Function to create a previous DBMS_STATS.SEED_COL_USAGE Detected rank .

--  Based on the usage information captured during the monitoring window , by  customers_test  Table create rank .
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(user, 'customers_test') FROM DUAL;

###############################################################################

EXTENSIONS FOR SH.CUSTOMERS_TEST
................................

1. (CUST_CITY, CUST_STATE_PROVINCE, 
    COUNTRY_ID)                        : SYS_STUMZ$C3AIHLPBROI#SKA58H_N created
2. (CUST_STATE_PROVINCE, COUNTRY_ID)   : SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ created
###############################################################################

--  The database for customers_test  Created two ranks : A rank is used to filter predicates , One set for GROUP BY  operation .

--  Collect table statistics again .
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'customers_test');

--  Check  USER_TAB_COL_STATISTICS  View to determine what additional statistics the database has created , Note that the last word is SYS_ The first two lines :
COL COLUMN_NAME FOR A30
SET PAGES 9999
SELECT COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'CUSTOMERS_TEST'
ORDER BY 1;

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM      
------------------------------ ------------ ---------------
COUNTRY_ID                               19 FREQUENCY      
CUST_CITY                               620 HYBRID         
CUST_CITY_ID                            620 NONE           
CUST_CREDIT_LIMIT                         8 NONE           
CUST_EFF_FROM                             1 NONE           
CUST_EFF_TO                               0 NONE           
CUST_EMAIL                             1699 NONE           
CUST_FIRST_NAME                        1300 NONE           
CUST_GENDER                               2 NONE           
CUST_ID                               55500 NONE           
CUST_INCOME_LEVEL                        12 NONE           
CUST_LAST_NAME                          908 NONE           
CUST_MAIN_PHONE_NUMBER                51344 NONE           
CUST_MARITAL_STATUS                      11 NONE           
CUST_POSTAL_CODE                        623 NONE           
CUST_SRC_ID                               0 NONE           
CUST_STATE_PROVINCE                     145 FREQUENCY      
CUST_STATE_PROVINCE_ID                  145 NONE           
CUST_STREET_ADDRESS                   49900 NONE           
CUST_TOTAL                                1 NONE           
CUST_TOTAL_ID                             1 NONE           
CUST_VALID                                2 NONE           
CUST_YEAR_OF_BIRTH                       75 NONE           
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ 145 NONE 
SYS_STUMZ$C3AIHLPBROI#SKA58H_N 620 HYBRID 

25 rows selected. 

Explain the plan again , The new plan shows a more accurate base estimate ( In the first plan ,1032 Very close to 945, And the second plan is completely accurate ).

EXPLAIN PLAN FOR
  SELECT *
  FROM   customers_test
  WHERE  cust_city = 'Los Angeles'
  AND    cust_state_province = 'CA'
  AND    country_id = 52790;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

----------------------------------------------------
| Id  | Operation         | Name           | Rows  |
----------------------------------------------------
|   0 | SELECT STATEMENT  |                |  1013 |
|   1 |  TABLE ACCESS FULL| CUSTOMERS_TEST |  1013 |
----------------------------------------------------

EXPLAIN PLAN FOR
  SELECT   country_id, cust_state_province, count(cust_city)
  FROM     customers_test
  GROUP BY country_id, cust_state_province;
 
SELECT PLAN_TABLE_OUTPUT 
FROM   TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

-----------------------------------------------------
| Id  | Operation          | Name           | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT   |                |   145 |
|   1 |  HASH GROUP BY     |                |   145 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------

clear :

drop table customers_test;

14.1.4 Creating and Gathering Statistics on Column Groups Manually

In some cases , You may already know the rank you want to create .

DBMS_STATS.GATHER_TABLE_STATS Functional METHOD_OPT Parameter can automatically create and collect statistics of rank . You can use the FOR COLUMNS Specify a rank to create a new rank .

This tutorial assumes the following :

  • Would you like to serve sh In the customer table in the schema cust_state_province and country_id Column create a column group .
  • You want to collect statistics about the entire table and the new rank ( Including histograms ).
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS( 'sh','customers',
  METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY ' ||
                'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );
END;
/

14.1.5 Displaying Column Group Information

To get the name of the rank , Please use DBMS_STATS.SHOW_EXTENDED_STATS_NAME Function or database view .

You can also use views to get information such as the number of different values and whether a column group has a histogram .

This tutorial assumes the following :

  • You are in sh The customer table of the schema is cust_state_province and country_id Column creates a rank .
  • You want to determine the rank name 、 The number of different values and whether a histogram has been created for the rank .
SELECT
    sys.dbms_stats.show_extended_stats_name('sh', 'customers', '(cust_state_province,country_id)') col_group_name
FROM
    dual;

COL_GROUP_NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
-------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_

COL EXTENSION_NAME FOR A40
SELECT EXTENSION_NAME, EXTENSION 
FROM   USER_STAT_EXTENSIONS 
WHERE  TABLE_NAME='CUSTOMERS';

EXTENSION_NAME                           EXTENSION                                                                       
---------------------------------------- -----------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE","COUNTRY_ID") 

SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM   USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS';

COL_GROUP                             NUM_DISTINCT        HISTOGRAM
-------------------------------------------------------------------
("COUNTRY_ID","CUST_STATE_PROVINCE")  145                 FREQUENCY

14.1.6 Dropping a Column Group

Use DBMS_STATS.DROP_EXTENDED_STATS Function to delete a column group from a table .

This tutorial assumes the following :

  • You are in sh The customer table of the schema is cust_state_province and country_id Column creates a rank .
  • You want to delete the rank .
EXEC DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 'customers', '(cust_state_province, country_id)' );

14.2 Managing Expression Statistics

When WHERE Clause has predicates that use expressions , An extended statistical type called expression statistics can improve the optimizer's estimates .

14.2.1 About Expression Statistics

For application to WHERE Clause column (function(col)=constant) Expression of form , The optimizer does not know how this function affects predicate cardinality , Unless there is a function based index . however , You can collect information about expression(function(col) Its own expression statistics .

The following figure shows the optimizer using statistics to generate a plan for a query that uses functions . Statistics for optimizer check columns are displayed at the top . The bottom shows the optimizer checking the statistics corresponding to the expressions used in the query . Expression statistics produce more accurate estimates .

 Insert picture description here
As shown in the figure above , When expression statistics are not available , The optimizer can generate sub optimal plans .

14.2.1.1 When Expression Statistics Are Useful: Example

SELECT COUNT(*) FROM sh.customers WHERE cust_state_province='CA';
 
  COUNT(*)
----------
      3341

EXPLAIN PLAN FOR 
SELECT * FROM sh.customers WHERE LOWER(cust_state_province)='ca';

SELECT * FROM TABLE(dbms_xplan.display);

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   555 |   108K|   445   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   555 |   108K|   445   (1)| 00:00:01 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(LOWER("CUST_STATE_PROVINCE")='ca')

because LOWER(cust_state_province)=‘ca’ No expression statistics exist , So the optimizer estimates deviate significantly . You can use DBMS_STATS Process to correct these estimates .

14.2.2 Creating Expression Statistics

You can use DBMS_STATS Create statistics for user specified expressions .

You can use any of the following program units :

  • GATHER_TABLE_STATS The process
  • CREATE_EXTENDED_STATISTICS Function followed by GATHER_TABLE_STATS The process

This tutorial assumes the following :

  • For the use of UPPER(cust_state_province) Functional sh.customers Inquire about , Selective estimation is not accurate .
  • You want to collect information about UPPER(cust_state_province) Statistics of expressions .
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS( 
    'sh'
,   'customers'
,   method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY ' || 
                  'FOR COLUMNS (LOWER(cust_state_province)) SIZE SKEWONLY' 
);
END;

14.2.3 Displaying Expression Statistics

To get information about expression Statistics , Please use the database view DBA_STAT_EXTENSIONS and DBMS_STATS.SHOW_EXTENDED_STATS_NAME function .

You can also use views to get information such as the number of different values and whether a column group has a histogram .

This tutorial assumes the following :

  • You are LOWER(cust_state_province) Expression creates extended Statistics .
  • You want to determine the rank name 、 The number of different values and whether a histogram has been created for the rank .

To monitor expression Statistics :

COL EXTENSION_NAME FORMAT a30
COL EXTENSION FORMAT a35

SELECT EXTENSION_NAME, EXTENSION
FROM   USER_STAT_EXTENSIONS
WHERE  TABLE_NAME='CUSTOMERS';

EXTENSION_NAME                 EXTENSION                          
------------------------------ -----------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE (LOWER("CUST_STATE_PROVINCE"))  

SELECT e.EXTENSION expression, t.NUM_DISTINCT, t.HISTOGRAM
FROM   USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE  e.EXTENSION_NAME=t.COLUMN_NAME
AND    e.TABLE_NAME=t.TABLE_NAME
AND    t.TABLE_NAME='CUSTOMERS';

EXPRESSION                            NUM_DISTINCT        HISTOGRAM
-------------------------------------------------------------------
(LOWER("CUST_STATE_PROVINCE"))        145                 FREQUENCY

14.2.4 Dropping Expression Statistics

To delete a rank from a table , Please use DBMS_STATS.DROP_EXTENDED_STATS function .

This tutorial assumes the following :

  • You are LOWER(cust_state_province) Expression creates extended Statistics .
  • You want to delete the expression Statistics .
BEGIN
  DBMS_STATS.DROP_EXTENDED_STATS(
    'sh'
,   'customers'
,   '(LOWER(cust_state_province))'
);
END;
/
原网站

版权声明
本文为[dingdingfish]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206122129527705.html