当前位置:网站首页>How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
2022-06-21 18:35:00 【User 7185440】
How to use dba_hist_active_sess_history Analyze database history performance issues
background
In many cases , When there is a performance problem with the database , We don't have a chance to gather enough diagnostic information , such as system state dump perhaps hang analyze, Even when the problem happens DBA Not at all . This makes it very difficult for us to diagnose the problem . So in this case , Can we collect some information afterwards to analyze the cause of the problem ?
stay Oracle 10G Or later , The answer is yes . In this article, we will introduce a method through dba_hist_active_sess_history A method of analyzing problems based on data .Apply to
Oracle 10G Or later stand-alone or rac, This article applies to any platform .details
Active Session: on the CPU or was waiting for an event that didn't belong to the Idle wait class
stay Oracle 10G in , We introduced AWR and ASH Sampling mechanism , There is a view gv$active_session_history Every second, all nodes of the database will be Active Session Take a sample ,
and dba_hist_active_sess_history Will be gv$active_session_history Data in 10 Sample once per second and persist .
Based on this feature , We can analyze dba_hist_active_sess_history Of Session Sampling , To locate the exact time frame in which the problem occurred , And the of each sampling point can be observed top event and top holder. Here is an example to illustrate in detail .dba_hist_active_sess_history edition
1. Dump During the problem ASH data :
In order not to affect the production system , We can approximate the period of the problem ASH data export Come out and analyze on the tester .
be based on dba_hist_active_sess_history Create a new table t_ash, Then pass it through exp/imp Import to the tester . Execute... On the database where the problem occurred exp:
SQL> conn user/passwd
SQL> create table t_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');
$ exp user/passwd file=t_ash.dmp tables=(t_ash) log=t_ash.exp.log
Then import it into the tester :
$ imp user/passwd file=t_ash.dmp log=t_ash.imp.log2. Verify exported ASH Time range :
The proposal USES Oracle SQL Developer To prevent the output result from being broken into lines, which is not easy to observe .
set line 200 pages 1000
col sample_time for a25
col event for a40
alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select
t.dbid, t.instance_number, min(sample_time), max(sample_time), count(*) session_count
from t_ash t
group by t.dbid, t.instance_number
order by dbid, instance_number;
INSTANCE_NUMBER MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) SESSION_COUNT
1 2015-03-26 21:00:04.278 2015-03-26 22:59:48.387 2171
2 2015-03-26 21:02:12.047 2015-03-26 22:59:42.584 36
From the above output, we can see that the database has a total of 2 Nodes , The sampling time is... In total 2 Hours , node 1 Sampling ratio of nodes 2 A lot more , The problem may occur at the node 1 On .3. Confirm the exact time range of the problem :
Refer to the following script :
col event for a35
set lines 222
set pages 9999
select
dbid, instance_number, sample_id, sample_time, count(*) session_count
from t_ash t
group by dbid, instance_number, sample_id, sample_time
order by dbid, instance_number, sample_time;
INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME SESSION_COUNT
1 36402900 2015-03-26 22:02:50.985 4
1 36402910 2015-03-26 22:03:01.095 1
1 36402920 2015-03-26 22:03:11.195 1
1 36402930 2015-03-26 22:03:21.966 21
1 36402940 2015-03-26 22:03:32.116 102
1 36402950 2015-03-26 22:03:42.226 181
1 36402960 2015-03-26 22:03:52.326 200
1 36402970 2015-03-26 22:04:02.446 227
1 36402980 2015-03-26 22:04:12.566 242
1 36402990 2015-03-26 22:04:22.666 259
1 36403000 2015-03-26 22:04:32.846 289
1 36403010 2015-03-26 22:04:42.966 147
1 36403020 2015-03-26 22:04:53.076 2
1 36403030 2015-03-26 22:05:03.186 4
1 36403040 2015-03-26 22:05:13.296 1
1 36403050 2015-03-26 22:05:23.398 1
Observe the of each sampling point of the above output active session The number of , A sudden increase in number often means that a problem has occurred . From the above output, it can be determined that the exact time of the problem is 2015-03-26 22:03:21 ~ 22:04:42, The problem lasted about 1.5 minute .
Be careful : Observe whether the above output is out of gear , For example, there is no sampling at some time .4. Determine the of each sampling point top n event:
What we specify here is top 2 event And the sampling time is marked out to observe the situation of all sampling points . If there's a lot of data , You can also open sample_time To observe the situation in a certain period of time . Pay attention to the last column session_count It refers to the waiting time at the sampling point event Of session Number .
col event for a35
set lines 1000
set pages 9999
select t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.event,
t.session_state,
t.c session_count
from (select t.*,
rank() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select /*+ parallel 2 */
t.*,
count(*) over(partition by dbid, instance_number, sample_time, event) c,
row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:59:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
) t
where r1 = 1) t
where r < 3
order by dbid, instance_number, sample_time, r;
SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER EVENT SESSION_STATE SESSION_COUNT
36402900 22:02:50.985 1 ON CPU 3
36402900 22:02:50.985 1 db file sequential read WAITING 1
36402910 22:03:01.095 1 ON CPU 1
36402920 22:03:11.195 1 db file parallel read WAITING 1
36402930 22:03:21.966 1 cursor: pin S wait on X WAITING 11
36402930 22:03:21.966 1 latch: shared pool WAITING 4
36402940 22:03:32.116 1 cursor: pin S wait on X WAITING 83
36402940 22:03:32.116 1 SGA: allocation forcing component growth WAITING 16
36402950 22:03:42.226 1 cursor: pin S wait on X WAITING 161
36402950 22:03:42.226 1 SGA: allocation forcing component growth WAITING 17
36402960 22:03:52.326 1 cursor: pin S wait on X WAITING 177
36402960 22:03:52.326 1 SGA: allocation forcing component growth WAITING 20
36402970 22:04:02.446 1 cursor: pin S wait on X WAITING 204
36402970 22:04:02.446 1 SGA: allocation forcing component growth WAITING 20
36402980 22:04:12.566 1 cursor: pin S wait on X WAITING 219
36402980 22:04:12.566 1 SGA: allocation forcing component growth WAITING 20
36402990 22:04:22.666 1 cursor: pin S wait on X WAITING 236
36402990 22:04:22.666 1 SGA: allocation forcing component growth WAITING 20
36403000 22:04:32.846 1 cursor: pin S wait on X WAITING 265
36403000 22:04:32.846 1 SGA: allocation forcing component growth WAITING 20
36403010 22:04:42.966 1 enq: US - contention WAITING 69
36403010 22:04:42.966 1 latch: row cache objects WAITING 56
36403020 22:04:53.076 1 db file scattered read WAITING 1
36403020 22:04:53.076 1 db file sequential read WAITING 1
From the above output, we can find that the most serious waiting during the problem is cursor: pin S wait on X, Wait for the rush hour event Of session The number has reached 265 individual , Next is SGA: allocation forcing component growth, Fastigium session by 20 individual .
*** Be careful :
1) Confirm again whether the above output is out of gear , Is there some time when there is no sampling .
2) Pay attention to those session_state by ON CPU Output , Compare ON CPU The number of processes is the same as your OS Physics CPU The number of , If it approaches or exceeds physical CPU Number , Then you need to check OS At that time CPU Resource status ,
such as OSWatcher/NMON Tools such as , high CPU Run Queue May cause the problem , Of course, it may also be the result of the problem , Need to combine OSWatcher and ASH Time order to verify .5. Observe the waiting chain of each sampling point : ( Show reference in more detail : attach 1)
The principle is through dba_hist_active_sess_history. blocking_session Records of the holder To pass connect by Cascade query , Find out the final holder. stay RAC Environment , For each node ASH The sampling time is not consistent in many cases ,
So you can use this SQL The second paragraph of the note sample_time Make a little modification to make the difference between different nodes 1 The sampling time of seconds can be compared ( Note that it is best to also partition by Medium sample_time Make corresponding changes ).
In this output isleaf=1 All is final holder, and iscycle=1 Represents deadlock ( That is, in the same sampling point a wait for b,b wait for c, and c Waiting again a, If this continues to happen , Then it is particularly noteworthy ). Using the following query, we can observe the blocking chain .
col event for a35
set lines 1000
set pages 9999
select /*+ parallel 8 */
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
/*start with event = 'buffer busy waits'*/
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
order siblings by dbid, sample_time;
LV ISLEAF ISCYCLE SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SESSION_STATE BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SESSION_STATUS
1 0 0 22:04:32.846 1 1259 3ajt2htrmb83y cursor: WAITING 1 537 VALID
2 1 0 22:04:32.846 1 537 3ajt2htrmb83y SGA: WAITING UNKNOWN
*** Be careful :
In order to make the output easy to read , We will wait for event Abbreviated , The same below . As can be seen from the output above , At the same sampling point (22:04:32.846), node 1 session 1259 Waiting for the cursor: pin S wait on X, It is called node 1 session 537 Blocking ,
And node 1 session 537 Waiting again SGA: allocation forcing component growth, also ASH No samples were collected holder, So here cursor: pin S wait on X It's just a superficial phenomenon ,
Here's the problem SGA: allocation forcing component growth6. Based on the first 5 Step principle to find out the final value of each sampling point top holder:
Such as the following SQL Each sampling point is listed top 2 Of blocker session, The final blocking probability is calculated session Count ( Reference resources blocking_session_count)
col event for a35
set lines 1000
set pages 9999
select t.lv,
t.iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.seq#,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status,
t.c blocking_session_count
from (select t.*,
row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select t.*,
count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
from (select
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.*
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
/*start with event = 'buffer busy waits'*/
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior
blocking_session_serial# = session_serial#) t
where t.isleaf = 1) t
where r1 = 1) t
where r < 3
order by dbid, sample_time, r;
SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SEQ# SESSION_STATE BLOCKING_SESSION_STATUS BLOCKING_SESSION_COUNT
22:03:32.116 1 1136 1p4vyw2jan43d SGA: 1140 WAITING UNKNOWN 82
22:03:32.116 1 413 9g51p4bt1n7kz SGA: 7646 WAITING UNKNOWN 2
22:03:42.226 1 1136 1p4vyw2jan43d SGA: 1645 WAITING UNKNOWN 154
22:03:42.226 1 537 3ajt2htrmb83y SGA: 48412 WAITING UNKNOWN 4
22:03:52.326 1 1136 1p4vyw2jan43d SGA: 2150 WAITING UNKNOWN 165
22:03:52.326 1 537 3ajt2htrmb83y SGA: 48917 WAITING UNKNOWN 8
22:04:02.446 1 1136 1p4vyw2jan43d SGA: 2656 WAITING UNKNOWN 184
22:04:02.446 1 537 3ajt2htrmb83y SGA: 49423 WAITING UNKNOWN 10
22:04:12.566 1 1136 1p4vyw2jan43d SGA: 3162 WAITING UNKNOWN 187
22:04:12.566 1 2472 SGA: 1421 WAITING UNKNOWN 15
22:04:22.666 1 1136 1p4vyw2jan43d SGA: 3667 WAITING UNKNOWN 193
22:04:22.666 1 2472 SGA: 1926 WAITING UNKNOWN 25
22:04:32.846 1 1136 1p4vyw2jan43d SGA: 4176 WAITING UNKNOWN 196
22:04:32.846 1 2472 SGA: 2434 WAITING UNKNOWN 48
Output from above , Like the first line , On behalf of the 22:03:32.116, node 1 Of session 1136 It finally blocked 82 individual session. Look down time , Visible nodes 1 Of session 1136 It was the most serious problem during the problem holder,
It's blocked at every sampling point 100 Multiple session, And it keeps waiting SGA: allocation forcing component growth,
Pay attention to its seq# You will find that event Of seq# Changing , Indicates that the session Not completely hang live , Because the time happens at night 22:00 about , This is obviously due to the automatic collection of Statistics job Lead to shared memory resize cause ,
So we can combine Scheduler/MMAN Of trace as well as dba_hist_memory_resize_ops The output of further determines the problem .
** Be careful :
1) blocking_session_count Refers to a holder Finally blocked session Count , such as a <- b<- c (a By b Blocking ,b Has been c Blocking ), Only calculate c blocked 1 individual session, Because in the middle b Repetition may occur in different blocking chains .
2) If the final holder Has not been ash sampling ( Usually because of holder At leisure ), such as a<- c also b<- c (a By c Blocking , also b Also by c Blocking ), however c No sampling , Then the above script cannot c Statistics to the end holder in , This may lead to some omissions .
3) Attention comparison blocking_session_count The number of is the same as 3 The total number of each sampling point queried in this step session_count Count , If the of each sampling point blocking_session_count The number is far less than the total session_count Count , That means most session There is no record of holder, Therefore, the results of this query do not represent .
4) stay Oracle 10g in ,ASH did not blocking_inst_id Column , In all the above scripts , You just need to remove the column , therefore 10g Of ASH Generally, it can only be used to diagnose the problem of single node .Other things about ASH Application
In addition to ASH Data to find holder outside , We can also use it to get a lot of information ( Based on the database version ):
Such as through PGA_ALLOCATED Column to calculate the maximum of each sampling point PGA, total PGA To analyze ora-4030/Memory Swap Related issues ;
adopt TEMP_SPACE_ALLOCATED Column to analyze temporary tablespace usage ;
adopt IN_PARSE/IN_HARD_PARSE/IN_SQL_EXECUTION Column to analyze SQL be in parse Or the execution stage ;
adopt CURRENT_OBJ#/CURRENT_FILE#/CURRENT_BLOCK# To make sure I/O Related to the object waiting to happen .
attach 1:
col event for a35
set lines 1000
set pages 9999
select /*+ parallel 8 */
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLOCATED,
t.TEMP_SPACE_ALLOCATED,
t.DELTA_READ_IO_BYTES,
t.DELTA_WRITE_IO_BYTES,
t.TM_DELTA_CPU_TIME,
t.event,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_serial#,
t.blocking_session_status,
t.WAIT_TIME,
t.TIME_WAITED,
t.CURRENT_OBJ#,
t.CURRENT_ROW#,
t.xid
from m_ash_202204162 t
start with event = 'buffer busy waits'
--start with blocking_session is not null
connect by nocycle prior dbid = dbid and prior blocking_session = session_id and prior blocking_session_serial# = session_serial#
and prior sample_time = sample_time and prior blocking_inst_id = instance_number order siblings by dbid, sample_time;attach 2: View a specific session Sample waiting information for
col event for a35
set lines 1000
set pages 9999
select /*+ parallel 4 */
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLOCATED,
t.TEMP_SPACE_ALLOCATED,
t.DELTA_READ_IO_BYTES,
t.DELTA_WRITE_IO_BYTES,
t.TM_DELTA_CPU_TIME,
t.event,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_serial#,
t.blocking_session_status,
t.WAIT_TIME,
t.TIME_WAITED,
t.CURRENT_OBJ#,
t.CURRENT_ROW#,
t.xid
from m_ash_202204162 t
where session_id = 10501
order by dbid, instance_number,sample_time;attach 3: see Sql Text
select * from dba_hist_sqltext where sql_id='bxgrjr8wa7g7x';attach 4: see sql Refer to the documentation for execution details :
analysis sql Historical execution information .md
边栏推荐
- POSIX create terminate thread
- 高考后网上查询信息,注意防范没有 SSL证书的网站
- Typescript object type
- Laravel实现软删除
- Check information on the Internet after the college entrance examination, and pay attention to prevent websites without SSL certificates
- 如何通过 dba_hist_active_sess_history 分析数据库历史性能问题
- JSON parsing of node
- 剑指 Offer 37. 序列化二叉树
- 有哪些好用的工作汇报工具
- 缓存型数据库Redis的配置与优化
猜你喜欢

Basic file operation

WXML模板语法、WXSS模板样式、全局配置、页面配置和网络数据请求

Node的json解析

论文解读(USIB)《Towards Explanation for Unsupervised Graph-Level Representation Learning》

Node输出方式

案例驱动 :从入门到掌握Shell编程详细指南

Some basic features of typescript

TypeScript类对象的初始化

EtherCAT object dictionary analysis
![Leetcode 1108 IP address invalidation [violence] the leetcode path of heroding](/img/c6/d3eb6cee92b1c0848bf3d3b58b8b22.png)
Leetcode 1108 IP address invalidation [violence] the leetcode path of heroding
随机推荐
Day15Qt中字符串的常用操作2021-10-20
Node输出方式
Gartner 网络研讨会 “九问数字化转型” 会后感
会议聊天室--开发文档
ByteDance proposes a lightweight and efficient new network mocovit, which has better performance than GhostNet and mobilenetv3 in CV tasks such as classification and detection!
Day19QPushButton的使用2021-10-30
LeetCode 1108 IP地址无效化[暴力] HERODING的LeetCode之路
新一代稳定性测试利器Fastbot
Day18Qt信号与槽2021-10-29
Typescript的复合类型
科普大佬说 | 如何打造自己的AI创造力?
在国元期货开户做期货安全吗?
PHP的empty,isset和is_null区别
JZ59.按之字型顺序打印二叉树
Day16QtQLabel2021-10-22
TypeScript对象类型
如何把1000随机分配成10个数
【微服务|Nacos】快速实现nacos的配置中心功能,并完成配置更新和版本迭代
Typescript interface
2022低压电工考试题模拟考试题库及答案