当前位置：网站首页>How to use DBA_ hist_ active_ sess_ History analysis database history performance problems

How to use DBA_ hist_ active_ sess_ History analysis database history performance problems

2022-06-21 18:35:00 【User 7185440】

How to use dba_hist_active_sess_history Analyze database history performance issues

background

	 In many cases , When there is a performance problem with the database , We don't have a chance to gather enough diagnostic information , such as system state dump perhaps hang analyze, Even when the problem happens DBA Not at all . This makes it very difficult for us to diagnose the problem . So in this case , Can we collect some information afterwards to analyze the cause of the problem ？
	 stay Oracle 10G Or later , The answer is yes . In this article, we will introduce a method through dba_hist_active_sess_history A method of analyzing problems based on data .

Apply to

	Oracle 10G Or later stand-alone or rac, This article applies to any platform .

details

     Active Session: on the CPU or was waiting for an event that didn't belong to the Idle wait class
	 stay Oracle 10G in , We introduced AWR and ASH Sampling mechanism , There is a view gv$active_session_history Every second, all nodes of the database will be Active Session Take a sample ,
	 and dba_hist_active_sess_history Will be gv$active_session_history Data in 10 Sample once per second and persist .
	 Based on this feature , We can analyze dba_hist_active_sess_history Of Session Sampling , To locate the exact time frame in which the problem occurred , And the of each sampling point can be observed top event and top holder. Here is an example to illustrate in detail .

dba_hist_active_sess_history edition

1. Dump During the problem ASH data ：

	 In order not to affect the production system , We can approximate the period of the problem ASH data export Come out and analyze on the tester .
	 be based on dba_hist_active_sess_history Create a new table t_ash, Then pass it through exp/imp Import to the tester . Execute... On the database where the problem occurred exp:
	SQL> conn user/passwd
	SQL> create table t_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

	$ exp user/passwd file=t_ash.dmp tables=(t_ash) log=t_ash.exp.log

	 Then import it into the tester ：
	$ imp user/passwd file=t_ash.dmp log=t_ash.imp.log

2. Verify exported ASH Time range :

	 The proposal USES Oracle SQL Developer To prevent the output result from being broken into lines, which is not easy to observe .

	set line 200 pages 1000
	col sample_time for a25
	col event for a40
	alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';

	select
	 t.dbid, t.instance_number, min(sample_time), max(sample_time), count(*) session_count
	  from t_ash t
	 group by t.dbid, t.instance_number
	 order by dbid, instance_number;

	INSTANCE_NUMBER    MIN(SAMPLE_TIME)    MAX(SAMPLE_TIME)    SESSION_COUNT
	1    2015-03-26 21:00:04.278    2015-03-26 22:59:48.387    2171
	2    2015-03-26 21:02:12.047    2015-03-26 22:59:42.584    36

	 From the above output, we can see that the database has a total of 2 Nodes , The sampling time is... In total 2 Hours , node 1 Sampling ratio of nodes 2 A lot more , The problem may occur at the node 1 On .

3. Confirm the exact time range of the problem :

	 Refer to the following script ：
    col event for a35
	set lines 222
	set pages 9999
	select
	 dbid, instance_number, sample_id, sample_time, count(*) session_count
	  from t_ash t
	 group by dbid, instance_number, sample_id, sample_time
	 order by dbid, instance_number, sample_time;

	INSTANCE_NUMBER    SAMPLE_ID    SAMPLE_TIME    SESSION_COUNT
	1    36402900    2015-03-26 22:02:50.985    4
	1    36402910    2015-03-26 22:03:01.095    1
	1    36402920    2015-03-26 22:03:11.195    1
	1    36402930    2015-03-26 22:03:21.966    21
	1    36402940    2015-03-26 22:03:32.116    102
	1    36402950    2015-03-26 22:03:42.226    181
	1    36402960    2015-03-26 22:03:52.326    200
	1    36402970    2015-03-26 22:04:02.446    227
	1    36402980    2015-03-26 22:04:12.566    242
	1    36402990    2015-03-26 22:04:22.666    259
	1    36403000    2015-03-26 22:04:32.846    289
	1    36403010    2015-03-26 22:04:42.966    147
	1    36403020    2015-03-26 22:04:53.076    2
	1    36403030    2015-03-26 22:05:03.186    4
	1    36403040    2015-03-26 22:05:13.296    1
	1    36403050    2015-03-26 22:05:23.398    1

	 Observe the of each sampling point of the above output active session The number of , A sudden increase in number often means that a problem has occurred . From the above output, it can be determined that the exact time of the problem is 2015-03-26 22:03:21 ~ 22:04:42, The problem lasted about 1.5 minute .
	 Be careful :  Observe whether the above output is out of gear , For example, there is no sampling at some time .

4. Determine the of each sampling point top n event:

	 What we specify here is top 2 event  And the sampling time is marked out to observe the situation of all sampling points . If there's a lot of data , You can also open sample_time To observe the situation in a certain period of time . Pay attention to the last column session_count It refers to the waiting time at the sampling point event Of session Number .
    col event for a35
	set lines 1000
	set pages 9999
	select t.dbid,
		   t.sample_id,
		   t.sample_time,
		   t.instance_number,
		   t.event,
		   t.session_state,
		   t.c session_count
	  from (select t.*,
				   rank() over(partition by dbid, instance_number, sample_time order by c desc) r
			  from (select /*+ parallel 2 */
					 t.*,
					 count(*) over(partition by dbid, instance_number, sample_time, event) c,
					 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
					  from t_ash t
					/*where sample_time >
						to_timestamp('2013-11-17 13:59:00',
									 'yyyy-mm-dd hh24:mi:ss')
					and sample_time <
						to_timestamp('2013-11-17 14:10:00',
									 'yyyy-mm-dd hh24:mi:ss')*/
					) t
			 where r1 = 1) t
	 where r < 3
	 order by dbid, instance_number, sample_time, r;

	SAMPLE_ID    SAMPLE_TIME    INSTANCE_NUMBER        	   EVENT   							 SESSION_STATE   	 SESSION_COUNT
	36402900    22:02:50.985    		1   		     ON CPU                     		      				     3
	36402900    22:02:50.985    		1   		db file sequential read  					  WAITING    		 1
	36402910    22:03:01.095    		1   		     ON CPU             					                     1
	36402920    22:03:11.195    		1   		  db file parallel read  					  WAITING 	         1
	36402930    22:03:21.966    		1   		cursor: pin S wait on X  					  WAITING           11
	36402930    22:03:21.966    		1   		     latch: shared pool   					  WAITING            4
	36402940    22:03:32.116    		1   		cursor: pin S wait on X    					  WAITING           83
	36402940    22:03:32.116    		1   		SGA: allocation forcing component growth   	  WAITING  		    16
	36402950    22:03:42.226    		1   		cursor: pin S wait on X    					  WAITING          161	 		          
	36402950    22:03:42.226    		1   		SGA: allocation forcing component growth   	  WAITING  		    17
	36402960    22:03:52.326    		1   		cursor: pin S wait on X   					  WAITING          177	 		          
	36402960    22:03:52.326    		1   		SGA: allocation forcing component growth   	  WAITING  		    20
	36402970    22:04:02.446    		1   		cursor: pin S wait on X  					  WAITING  		   204	 		          
	36402970    22:04:02.446    		1   		SGA: allocation forcing component growth   	  WAITING  		    20
	36402980    22:04:12.566    		1   		cursor: pin S wait on X  					  WAITING          219	 		          
	36402980    22:04:12.566    		1   		SGA: allocation forcing component growth   	  WAITING  		    20
	36402990    22:04:22.666    		1   		cursor: pin S wait on X   					  WAITING   	   236	 		          
	36402990    22:04:22.666    		1   		SGA: allocation forcing component growth   	  WAITING  		    20
	36403000    22:04:32.846    		1   		cursor: pin S wait on X  					  WAITING          265	 		          
	36403000    22:04:32.846    		1   		SGA: allocation forcing component growth   	  WAITING  		    20
	36403010    22:04:42.966    		1   		enq: US - contention  	 					  WAITING    		69
	36403010    22:04:42.966    		1   		latch: row cache objects 					  WAITING    		56
	36403020    22:04:53.076    		1   		db file scattered read   					  WAITING    		1
	36403020    22:04:53.076    		1   		db file sequential read  					  WAITING    		1

	 From the above output, we can find that the most serious waiting during the problem is cursor: pin S wait on X, Wait for the rush hour event Of session The number has reached 265 individual , Next is SGA: allocation forcing component growth, Fastigium session by 20 individual .

	*** Be careful :
		1)  Confirm again whether the above output is out of gear , Is there some time when there is no sampling .
		2)  Pay attention to those session_state by ON CPU Output , Compare ON CPU The number of processes is the same as your OS Physics CPU The number of , If it approaches or exceeds physical CPU Number , Then you need to check OS At that time CPU Resource status ,
		    such as OSWatcher/NMON Tools such as , high CPU Run Queue May cause the problem , Of course, it may also be the result of the problem , Need to combine OSWatcher and ASH Time order to verify .

5. Observe the waiting chain of each sampling point : ( Show reference in more detail ： attach 1)

	 The principle is through dba_hist_active_sess_history. blocking_session Records of the holder To pass connect by Cascade query , Find out the final holder.  stay RAC Environment , For each node ASH The sampling time is not consistent in many cases ,
	 So you can use this SQL The second paragraph of the note sample_time Make a little modification to make the difference between different nodes 1 The sampling time of seconds can be compared ( Note that it is best to also partition by Medium sample_time Make corresponding changes ).
	 In this output isleaf=1 All is final holder, and iscycle=1 Represents deadlock ( That is, in the same sampling point a wait for b,b wait for c, and c Waiting again a, If this continues to happen , Then it is particularly noteworthy ). Using the following query, we can observe the blocking chain .
    col event for a35
	set lines 1000
	set pages 9999
	select  /*+ parallel 8 */
	 level                     lv,
	 connect_by_isleaf         isleaf,
	 connect_by_iscycle        iscycle,
	 t.dbid,
	 t.sample_id,
	 t.sample_time,
	 t.instance_number,
	 t.session_id,
	 t.sql_id,
	 t.session_type,
	 t.event,
	 t.session_state,
	 t.blocking_inst_id,
	 t.blocking_session,
	 t.blocking_session_status
	  from t_ash t
	/*where sample_time >
		to_timestamp('2013-11-17 13:55:00',
					 'yyyy-mm-dd hh24:mi:ss')
	and sample_time <
		to_timestamp('2013-11-17 14:10:00',
					 'yyyy-mm-dd hh24:mi:ss')*/
	 start with blocking_session is not null
	 /*start with event = 'buffer busy waits'*/
	connect by nocycle
	 prior dbid = dbid
		   and prior sample_time = sample_time
			  /*and ((prior sample_time) - sample_time between interval '-1'
			  second and interval '1' second)*/
		   and prior blocking_inst_id = instance_number
		   and prior blocking_session = session_id
		   and prior blocking_session_serial# = session_serial#
	 order siblings by dbid, sample_time;

	LV    ISLEAF    ISCYCLE    SAMPLE_TIME    INSTANCE_NUMBER    SESSION_ID         SQL_ID        EVENT          SESSION_STATE    BLOCKING_INST_ID    BLOCKING_SESSION    BLOCKING_SESSION_STATUS
	1   	0   	 0   		22:04:32.846  		  1   			 1259  		3ajt2htrmb83y    cursor:           WAITING                1               537                    VALID
	2   	1   	 0   		22:04:32.846  		  1   			 537   		3ajt2htrmb83y    SGA:              WAITING                                                       UNKNOWN

	*** Be careful :
		 In order to make the output easy to read , We will wait for event Abbreviated , The same below . As can be seen from the output above , At the same sampling point (22:04:32.846), node 1 session 1259  Waiting for the cursor: pin S wait on X, It is called node 1 session 537  Blocking ,
		 And node 1 session 537  Waiting again SGA: allocation forcing component growth, also ASH No samples were collected holder, So here cursor: pin S wait on X It's just a superficial phenomenon ,
		 Here's the problem SGA: allocation forcing component growth

6. Based on the first 5 Step principle to find out the final value of each sampling point top holder:

	 Such as the following SQL Each sampling point is listed top 2 Of blocker session, The final blocking probability is calculated session Count ( Reference resources blocking_session_count)
    col event for a35
	set lines 1000
	set pages 9999
	select t.lv,
		   t.iscycle,
		   t.dbid,
		   t.sample_id,
		   t.sample_time,
		   t.instance_number,
		   t.session_id,
		   t.sql_id,
		   t.session_type,
		   t.event,
		   t.seq#,
		   t.session_state,
		   t.blocking_inst_id,
		   t.blocking_session,
		   t.blocking_session_status,
		   t.c blocking_session_count
	  from (select t.*,
				   row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
			  from (select t.*,
						   count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
						   row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
					  from (select
							 level              lv,
							 connect_by_isleaf  isleaf,
							 connect_by_iscycle iscycle,
							 t.*
							  from t_ash t
							/*where sample_time >
								to_timestamp('2013-11-17 13:55:00',
											 'yyyy-mm-dd hh24:mi:ss')
							and sample_time <
								to_timestamp('2013-11-17 14:10:00',
											 'yyyy-mm-dd hh24:mi:ss')*/
							 start with blocking_session is not null
							 /*start with event = 'buffer busy waits'*/
							connect by nocycle
							 prior dbid = dbid
								   and prior sample_time = sample_time
									  /*and ((prior sample_time) - sample_time between interval '-1'
									  second and interval '1' second)*/
								   and prior blocking_inst_id = instance_number
								   and prior blocking_session = session_id
								   and prior
										blocking_session_serial# = session_serial#) t
					 where t.isleaf = 1) t
			 where r1 = 1) t
	 where r < 3
	 order by dbid, sample_time, r;

	SAMPLE_TIME    INSTANCE_NUMBER    SESSION_ID    SQL_ID   	  EVENT    SEQ#    SESSION_STATE    BLOCKING_SESSION_STATUS    BLOCKING_SESSION_COUNT
	22:03:32.116  		 1  			 1136  	  1p4vyw2jan43d    SGA:    1140  	 WAITING   			 UNKNOWN   					 82
	22:03:32.116  		 1  			 413   	  9g51p4bt1n7kz    SGA:    7646  	 WAITING   			 UNKNOWN   					 2
	22:03:42.226  		 1  			 1136  	  1p4vyw2jan43d    SGA:    1645  	 WAITING   			 UNKNOWN   					 154
	22:03:42.226  		 1  			 537   	  3ajt2htrmb83y    SGA:    48412 	 WAITING   			 UNKNOWN   					 4
	22:03:52.326  		 1  			 1136  	  1p4vyw2jan43d    SGA:    2150  	 WAITING   			 UNKNOWN   					 165
	22:03:52.326  		 1  			 537   	  3ajt2htrmb83y    SGA:    48917 	 WAITING   			 UNKNOWN   					 8
	22:04:02.446  		 1  			 1136  	  1p4vyw2jan43d    SGA:    2656  	 WAITING   			 UNKNOWN   					 184
	22:04:02.446  		 1  			 537   	  3ajt2htrmb83y    SGA:    49423 	 WAITING   			 UNKNOWN   					 10
	22:04:12.566  		 1  			 1136  	  1p4vyw2jan43d    SGA:    3162  	 WAITING   			 UNKNOWN   					 187
	22:04:12.566  		 1  			 2472  	             	   SGA:    1421  	 WAITING   			 UNKNOWN   					 15
	22:04:22.666  		 1  			 1136  	  1p4vyw2jan43d    SGA:    3667  	 WAITING   			 UNKNOWN   					 193
	22:04:22.666  		 1  			 2472  	     			   SGA:    1926  	 WAITING   			 UNKNOWN   					 25
	22:04:32.846  		 1	  			 1136  	  1p4vyw2jan43d    SGA:    4176  	 WAITING   			 UNKNOWN   					 196
	22:04:32.846  		 1  			 2472  	      			   SGA:    2434  	 WAITING   			 UNKNOWN   					 48

     Output from above , Like the first line , On behalf of the 22:03:32.116, node 1 Of session 1136  It finally blocked 82 individual session.   Look down time , Visible nodes 1 Of session 1136  It was the most serious problem during the problem holder,
	 It's blocked at every sampling point 100 Multiple session, And it keeps waiting SGA: allocation forcing component growth,
	 Pay attention to its seq# You will find that event Of seq# Changing , Indicates that the session Not completely hang live , Because the time happens at night 22:00 about , This is obviously due to the automatic collection of Statistics job Lead to shared memory resize cause ,
	 So we can combine Scheduler/MMAN Of trace as well as dba_hist_memory_resize_ops The output of further determines the problem .
	** Be careful ：
		1) blocking_session_count  Refers to a holder Finally blocked session Count , such as  a <- b<- c （a By b Blocking ,b Has been c Blocking ）, Only calculate c blocked 1 individual session, Because in the middle b Repetition may occur in different blocking chains .
		2)  If the final holder Has not been ash sampling ( Usually because of holder At leisure ), such as  a<- c  also b<- c (a By c Blocking , also b Also by c Blocking ), however c No sampling , Then the above script cannot c Statistics to the end holder in , This may lead to some omissions .
		3)  Attention comparison blocking_session_count The number of is the same as 3 The total number of each sampling point queried in this step session_count Count , If the of each sampling point blocking_session_count The number is far less than the total session_count Count , That means most session There is no record of holder, Therefore, the results of this query do not represent .
		4)  stay Oracle 10g in ,ASH did not blocking_inst_id Column , In all the above scripts , You just need to remove the column , therefore 10g Of ASH Generally, it can only be used to diagnose the problem of single node .

Other things about ASH Application

	 In addition to ASH Data to find holder outside , We can also use it to get a lot of information ( Based on the database version )：
	 Such as through PGA_ALLOCATED Column to calculate the maximum of each sampling point PGA, total PGA To analyze ora-4030/Memory Swap Related issues ;
	 adopt TEMP_SPACE_ALLOCATED Column to analyze temporary tablespace usage ;
	 adopt IN_PARSE/IN_HARD_PARSE/IN_SQL_EXECUTION Column to analyze SQL be in parse Or the execution stage ;
	 adopt CURRENT_OBJ#/CURRENT_FILE#/CURRENT_BLOCK# To make sure I/O Related to the object waiting to happen .

attach 1：

col event for a35
set lines 1000
set pages 9999
select /*+ parallel 8 */
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle     iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLOCATED,
t.TEMP_SPACE_ALLOCATED,
t.DELTA_READ_IO_BYTES,
t.DELTA_WRITE_IO_BYTES,
t.TM_DELTA_CPU_TIME,
t.event,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_serial#,
t.blocking_session_status,
t.WAIT_TIME,
t.TIME_WAITED,
t.CURRENT_OBJ#,
t.CURRENT_ROW#,
t.xid
from m_ash_202204162 t
start with event = 'buffer busy waits'
--start with blocking_session is not null
connect by nocycle prior dbid = dbid and prior blocking_session = session_id and prior blocking_session_serial# = session_serial#
and prior sample_time = sample_time and prior blocking_inst_id = instance_number  order siblings by dbid, sample_time;

attach 2： View a specific session Sample waiting information for

col event for a35
set lines 1000
set pages 9999
select /*+ parallel 4 */
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLOCATED,
t.TEMP_SPACE_ALLOCATED,
t.DELTA_READ_IO_BYTES,
t.DELTA_WRITE_IO_BYTES,
t.TM_DELTA_CPU_TIME,
t.event,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_serial#,
t.blocking_session_status,
t.WAIT_TIME,
t.TIME_WAITED,
t.CURRENT_OBJ#,
t.CURRENT_ROW#,
t.xid
from m_ash_202204162 t
where session_id = 10501
order by dbid, instance_number,sample_time;

attach 3： see Sql Text

select * from dba_hist_sqltext where sql_id='bxgrjr8wa7g7x';

attach 4： see sql Refer to the documentation for execution details ：

analysis sql Historical execution information .md

原网站

版权声明
本文为[User 7185440]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/172/202206211654559291.html

当前位置：网站首页>How to use DBA_ hist_ active_ sess_ History analysis database history performance problems

How to use DBA_ hist_ active_ sess_ History analysis database history performance problems

How to use dba_hist_active_sess_history Analyze database history performance issues

background

Apply to

details

dba_hist_active_sess_history edition

1. Dump During the problem ASH data ：

2. Verify exported ASH Time range :

3. Confirm the exact time range of the problem :

4. Determine the of each sampling point top n event:

5. Observe the waiting chain of each sampling point : ( Show reference in more detail ： attach 1)

6. Based on the first 5 Step principle to find out the final value of each sampling point top holder:

Other things about ASH Application

attach 1：

attach 2： View a specific session Sample waiting information for

attach 3： see Sql Text

attach 4： see sql Refer to the documentation for execution details ：

边栏推荐

猜你喜欢

随机推荐