当前位置：网站首页>SQL tuning guide notes 15:controlling the use of optimizer statistics

SQL tuning guide notes 15:controlling the use of optimizer statistics

2022-06-12 21:41:00 【dingdingfish】

This paper is about SQL Tuning Guide The first 15 Chapter “Controlling the Use of Optimizer Statistics” The notes .

Important basic concepts

pending statistics
Unpublished optimizer statistics. By default, the optimizer uses published statistics but does not use pending statistics.
Unpublished optimizer Statistics . By default , The optimizer uses published statistics , But do not use pending Statistics .

Use DBMS_STATS, You can specify when and how the optimizer uses statistics .

15.1 Locking and Unlocking Optimizer Statistics

You can lock statistics to prevent them from changing .

After the statistics are locked , You cannot modify statistics , Until the statistics are unlocked . When you want to ensure that statistics and generated plans never change , The locking process is useful in static environments . for example , You may want to prevent DBMS_STATS_JOB Processes collect new statistics on tables or schemas , For example, a highly variable table .

When you lock statistics on a table , All relevant statistics are locked . Locking statistics include table statistics 、 Make statistics 、 Histogram and dependent index statistics . To overwrite statistics when they are locked , You can put all kinds of DBMS_STATS The process （ for example DELETE_STATS and RESTORE_STATS） Medium FORCE The value of the parameter is set to true.

15.1.1 Locking Statistics

DBMS_STATS Package provides two procedures for locking Statistics ：LOCK_SCHEMA_STATS and LOCK_TABLE_STATS.

This tutorial assumes the following ：

You have collected information about oe.orders Table and hr Statistics of patterns .
You want to prevent oe.orders Table statistics and hr Schema statistics changed .

BEGIN
  DBMS_STATS.LOCK_TABLE_STATS('OE','ORDERS');
END;
/

BEGIN
  DBMS_STATS.LOCK_SCHEMA_STATS('HR');
END;
/

15.1.2 Unlocking Statistics

DBMS_STATS The package provides two processes for unlocking Statistics ：UNLOCK_SCHEMA_STATS and UNLOCK_TABLE_STATS.

BEGIN
  DBMS_STATS.UNLOCK_TABLE_STATS('OE','ORDERS');
END;
/

BEGIN
  DBMS_STATS.UNLOCK_SCHEMA_STATS('HR');
END;
/

15.2 Publishing Pending Optimizer Statistics

By default , When the statistics collection is over , The database will automatically publish Statistics .

perhaps , You can use suspended statistics to save statistics , Instead of publishing them immediately after collection . This technique is useful for testing queries with pending statistics in a session . When the test results are satisfactory , You can publish statistics to make them available to the entire database .

15.2.1 About Pending Optimizer Statistics

The database stores the statistics to be processed in the data dictionary , Just like published statistics .

By default , The optimizer uses published statistics . You can do this by OPTIMIZER_USE_PENDING_STATISTICS The initialization parameter is set to true（ The default is false） To change the default behavior .

The top half of the figure below shows the optimizer collection sh.customers Table statistics and store them in the data dictionary and in a suspended state . The bottom of the figure shows that the optimizer uses only published statistics to process pairs of sh.customers Query for .

Insert picture description here
In some cases , The optimizer can combine published and pending statistics . for example , The database stores published and pending statistics for customer tables . For the order form , The database only stores published statistics . If OPTIMIZER_USE_PENDING_STATS = true, The optimizer uses pending statistics for the customer and published statistics for the order . If OPTIMIZER_USE_PENDING_STATS = false, The optimizer uses published customer and order statistics .

15.2.2 User Interfaces for Publishing Optimizer Statistics

You can use DBMS_STATS Package to perform operations related to publishing Statistics .

The following table lists the relevant program units .
surface 15-1 Related to publishing optimizer statistics DBMS_STATS Program unit

Program unit	describe
GET_PREFS	Check if the statistics are in DBMS_STATS Automatically publish them as soon as they are collected . For parameters PUBLISH,true Indicates that statistics must be published when the database collects statistics , and false Indicates that the database must keep statistics pending .
SET_TABLE_PREFS	At the table level PUBLISH Set to true or false.
SET_SCHEMA_PREFS	At the mode level PUBLISH Set to true or false.
PUBLISH_PENDING_STATS	Publish valid pending statistics for all objects or only for specified objects .
DELETE_PENDING_STATS	Delete pending Statistics .
EXPORT_PENDING_STATS	Export pending Statistics .

Initialize parameters OPTIMIZER_USE_PENDING_STATISTICS Determines whether the database uses pending statistics when available . The default value is false, This means that the optimizer uses only published statistics . Set to true To specify that the optimizer use any existing pending statistics instead . The best practice is At the session level, not at the database level Set this parameter .

You can use access information for published statistics from the data dictionary view . surface 15-2 Lists related views .
surface 15-2 Views related to publishing optimizer Statistics

View	describe
USER_TAB_STATISTICS	Displays optimizer statistics for tables that the current user can access .
USER_TAB_COL_STATISTICS	Show from ALL_TAB_COLUMNS Column statistics and histogram information extracted from .
USER_PART_COL_STATISTICS	Display the column statistics and histogram information of the table partition owned by the current user .
USER_SUBPART_COL_STATISTICS	Describes the column statistics and histogram information of the sub partitions of the partition object owned by the current user .
USER_IND_STATISTICS	Displays optimizer statistics for indexes that are accessible to the current user .
USER_TAB_PENDING_STATS	Describes the tables that the current user can access 、 Pending statistics for partitions and sub partitions .
USER_COL_PENDING_STATS	Pending statistics that describe the columns that the current user can access .
USER_IND_PENDING_STATS	Describe the use of DBMS_STATS The table that the current user can access for the package collection 、 Pending statistics for partitions and sub partitions .

15.2.3 Managing Published and Pending Statistics

This section explains how to use DBMS_STATS Program unit to change the publishing behavior of optimizer Statistics , And how to export and delete these statistics .

This tutorial assumes the following ：

You want to change sh.customers and sh.sales Table preferences , So that the newly collected statistics have a pending status .
You want the current session to use pending statistics .
You want to be in sh.customers Collect and publish pending statistics on the table .
You collected sh.sales Pending statistics on the table , But decided to delete them instead of publishing them .
You want to change sh.customers and sh.sales Table preferences , In order to publish the newly collected statistics .

Use GET_PREFS when , You can also specify the schema and table name . If set , This function will return the table preferences . otherwise , This function returns the global preferences . value true Indicates that the database publishes statistics when collecting statistics . Unless you set specific table preferences , Otherwise, each table uses this value .

SELECT DBMS_STATS.GET_PREFS('PUBLISH') PUBLISH FROM DUAL;

PUBLISH
-------
TRUE

Query pending Statistics . This example shows that the database is not currently stored sh Pending statistics for the pattern .

SELECT * FROM USER_TAB_PENDING_STATS;
 
no rows selected

change sh.customers Publishing preferences for tables .

EXEC DBMS_STATS.SET_TABLE_PREFS('sh', 'customers', 'publish', 'false');

And then , When you collect statistics on the customer table , The database does not automatically publish statistics when the collection job completes . contrary , The database stores the newly collected statistics in USER_TAB_PENDING_STATS In the table .

collect sh.customers Statistical information .

EXEC DBMS_STATS.GATHER_TABLE_STATS('sh','customers');

Query pending Statistics .

SELECT TABLE_NAME, NUM_ROWS FROM USER_TAB_PENDING_STATS;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
CUSTOMERS                           55500

Instructs the optimizer to use pending statistics from this session .

ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = true;

Release sh.customers Pending statistics for .

EXEC DBMS_STATS.PUBLISH_PENDING_STATS('SH','CUSTOMERS');

Delete pending Statistics .

EXEC DBMS_STATS.DELETE_PENDING_STATS('SH','CUSTOMERS');

Restore defaults ：

EXEC DBMS_STATS.SET_TABLE_PREFS('sh', 'customers', 'publish', null);

15.3 Creating Artificial Optimizer Statistics for Testing

To provide the optimizer with user created statistics for testing , You can use DBMS_STATS.SET_*_STATS The process . These procedures provide the optimizer with manual values that specify Statistics .

15.3.1 About Artificial Optimizer Statistics

For testing purposes , You can use DBMS_STATS.SET_*_STATS Process as table 、 Index or manually create manual statistics by the system .

When stattab It's empty time ,DBMS_STATS.SET_*_STATS The process directly inserts manual statistics into the data dictionary . perhaps , You can specify user created tables .

Be careful ：DBMS_STATS.SET_*_STATS The process is for development testing only . Do not use them in production databases . If you set statistics in the data dictionary , that Oracle The database will treat the set statistics as “ real ” Statistics , This means that when statistical information collection operations do not meet old standards , They may not re collect manual Statistics .

DBMS_STATS.SET_*_STATS A typical use case for a process is ：

Shows how the execution plan changes with the number of rows or blocks in the table
for example ,SET_TABLE_STATS You can set the number of rows and blocks in a small or empty table to a large number . When you execute a query with changed Statistics , The optimizer may change the execution plan . for example , Increasing the number of rows may cause the optimizer to choose an index scan over a full table scan . By testing different values , You can see how the optimizer will change its execution plan over time .
Create real statistics for temporary tables
You may want to look at multiple SQL What does the optimizer do when a large temporary table is referenced in a statement . You can create a regular table , Load representative data , And then use GET_TABLE_STATS Retrieve Statistics . After creating the temporary table , You can call SET_TABLE_STATS Come on “ cheating ” The optimizer uses these statistics .

perhaps , You can specify unique for statistics in user created tables ID. SET_*_STATS The process has corresponding GET_*_STATS The process .

DBMS_STATS The process	describe
SET_TABLE_STATS	Use numrows、numblks and avgrlen Parameter setting table or partition statistics . If the database uses In-Memory Column store, You can set im_imcu_count For a table or partition IMCU Number ,im_block_count Number of blocks for the table or partition . For external tables ,scanrate Designated by MB/ The rate at which data is scanned in seconds . The optimizer uses the cache data to estimate the number of cache blocks used for index or statistical table access . The total cost is to read data blocks from disk I/O cost 、 Read cache block from buffer cache CPU Cost and processing data CPU cost .
SET_COLUMN_STATS	Use distcnt、 density 、nullcnt And other parameters to set column statistics . In this process, the version of user-defined Statistics , Use stattypname Specify the type of statistics to store in the data dictionary .
SET_SYSTEM_STATS	Use iotfrspeed、sreadtim and cpuspeed And other parameters to set system statistics .
SET_INDEX_STATS	Use numrows、numlblks、avglblk、clstfct and indlevel And other parameters to set index statistics . In this process, the version of user-defined Statistics , Use stattypname Specify the type of statistics to store in the data dictionary .

15.3.2 Setting Artificial Optimizer Statistics for a Table

This topic explains how to use DBMS_STATS.SET_TABLE_STATS Set labor statistics for tables . SET_INDEX_STATS and SET_SYSTEM_STATS The basic steps are the same .

Please note the following task prerequisites ：

For not belonging to SYS The object of , You must be the owner of the object , Or with ANALYZE ANY jurisdiction .
about SYS The object of possession , You must have ANALYZE ANY DICTIONARY Permission or SYSDBA jurisdiction .
Is a table 、 Column or index calls GET_*_STATS when , The referenced object must exist .

This task assumes the following ：

You have access to the specified table DBMS_STATS.SET_TABLE_STATS Authority required .
You plan to store statistics in a data dictionary .

15.3.3 Setting Optimizer Statistics: Example

This example shows how to collect optimizer statistics for a table 、 Set labor statistics , Then compare the plans selected by the optimizer based on different statistics .

This example assumes ：

You have DBA Log in to the database as a user with permissions .
You want to test when the optimizer selects an index scan .

CREATE TABLE contractors (
  con_id    NUMBER,
  last_name VARCHAR2(50),
  salary    NUMBER,
  CONSTRAINT cond_id_pk PRIMARY KEY(con_id) );

CREATE INDEX salary_ix ON contractors(salary);

INSERT INTO contractors VALUES (8, 'JONES',1000);
COMMIT;

EXECUTE DBMS_STATS.GATHER_TABLE_STATS( user, tabname => 'CONTRACTORS' );

--  Output is 1
SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME = 'CONTRACTORS'; 

--  Output is 1
SELECT NUM_ROWS FROM USER_INDEXES WHERE INDEX_NAME =  'SALARY_IX';

--  The inquiry salary is  1000  Our contractors , Use  dynamic_sampling  Prompt to disable dynamic sampling ：
SELECT /*+ dynamic_sampling(contractors 0) */ * 
FROM   contractors 
WHERE  salary = 1000;

--  The original said ： Because only exists in the table  1  That's ok , So the optimizer chooses a full table scan instead of an index range scan .
--  However, my results show that the optimizer has correctly selected the access path 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CONTRACTORS |     1 |    12 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SALARY_IX   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

--  Set statistics manually 
BEGIN
  DBMS_STATS.SET_TABLE_STATS( 
    ownname => user
  , tabname => 'CONTRACTORS'
  , numrows => 2000
  , numblks => 10 );
END;
/

BEGIN 
  DBMS_STATS.SET_INDEX_STATS( 
    ownname => user
  , indname => 'SALARY_IX'
  , numrows => 2000 );
END;
/

--  The output is 2000
SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME = 'CONTRACTORS';
SELECT NUM_ROWS FROM USER_INDEXES WHERE INDEX_NAME =  'SALARY_IX'; 

--  Refresh the shared pool to eliminate the possibility of planned reuse , Then execute the same query 
ALTER SYSTEM FLUSH SHARED_POOL;

SELECT /*+ dynamic_sampling(contractors 0) */ * 
FROM   contractors 
WHERE  salary = 1000;

--  Based on manually generated row count and block distribution statistics , The optimizer thinks that index range scanning is more cost-effective .
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CONTRACTORS |  2000 | 24000 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SALARY_IX   |  2000 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

--  clear 
drop table contractors purge;