当前位置:网站首页>VIP case introduction and in-depth analysis of brokerage XX system node exceptions
VIP case introduction and in-depth analysis of brokerage XX system node exceptions
2022-07-06 21:57:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
System environment Hardware platform & operation
IBM 570
Operating system version number AIX 5.3
Physical memory 32G
Oracle Product and version number 10.2.0.5 RAC
Business types OLTP
Background Overview
Trading system in xx month xx Japan . Node two VIP Abnormal offline leads to the loss of database service at node 2
effect . After receiving the request . Connect at the first time . The fault occurred in the early morning 3 spot , also
AIX(errpt)、Oracle DB(alert.log )、CRS (crsd.log 、ocssd.log、vip.log、
coredump ) They didn't leave much valid information , It's a complicated situation .
Specific diagnosis and analysis of the problem
One 、 Check errpt journal :
Node two VIP Abnormal offline , No error is reported on node 2 .
Two 、 Check CRS journal :
Node one :
2013-04-29 03:42:33.180: [ CRSRES][11376]32startRunnable: setting CLIvalues
— explain : There is only one line of information on node 1 that reflects node 1 during the failure CRS I have run commands before .
Node two :
2013-04-29 03:41:12.308: [ CRSAPP][11263]32CheckResource error forora.xxxxdb02.vip error code =1
2013-04-29 03:41:12.335: [ CRSRES][11263]32In stateChanged,ora.xxxxdb02.vip target is ONLINE
2013-04-29 03:41:12.335: [ CRSRES][11263]32ora.xxxxdb02.vip on xxxxdb02went OFFLINE unexpectedly
2013-04-29 03:41:12.335: [ CRSRES][11263]32StopResource: setting CLIvalues
2013-04-29 03:41:12.340: [ CRSRES][11263]32Attempting to stop`ora.xxxxdb02.vip` on member
`xxxxdb02`
2013-04-29 03:41:12.893: [ CRSRES][11269]32In stateChanged,ora.xxxxdb.xxxxdb1.xxxxdb2.srv target
is ONLINE
2013-04-29 03:41:12.894: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srvon xxxxdb02 went OFFLINE
unexpectedly
2013-04-29 03:41:12.894: [ CRSRES][11269]32StopResource: setting CLIvalues
2013-04-29 03:41:12.899: [ CRSRES][11269]32Attempting to stop`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on
member `xxxxdb02`
2013-04-29 03:41:12.958: [ CRSRES][11263]32Stop of `ora.xxxxdb02.vip` onmember `xxxxdb02`
succeeded.
2013-04-29 03:41:12.971: [ CRSRES][11263]32ora.xxxxdb02.vipRESTART_COUNT=0 RESTART_ATTEMPTS=0
2013-04-29 03:41:12.976: [ CRSRES][11263]32ora.xxxxdb02.vip failed onxxxxdb02 relocating.
2013-04-29 03:41:13.025: [ CRSRES][11263]32StopResource: setting CLIvalues
2013-04-29 03:41:13.029: [ CRSRES][11263]32Attempting to stop
`ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` onmember `xxxxdb02`
2013-04-29 03:41:13.146: [ CRSRES][11269]32Stop of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member
`xxxxdb02` succeeded.
2013-04-29 03:41:13.146: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv RESTART_COUNT=0
RESTART_ATTEMPTS=1
2013-04-29 03:41:13.159: [ CRSRES][11269]32Restartingora.xxxxdb.xxxxdb1.xxxxdb2.srv on xxxxdb02
2013-04-29 03:41:13.164: [ CRSRES][11269]32startRunnable: setting CLIvalues
2013-04-29 03:41:13.164: [ CRSRES][11269]32Attempting to start `ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on
member `xxxxdb02`
2013-04-29 03:41:45.618: [ CRSAPP][11269]32StartResource error forora.xxxxdb.xxxxdb1.xxxxdb2.srv
error code = 1
2013-04-29 03:41:45.799: [ CRSRES][11269]32Start of`ora.xxxxdb.xxxxdb1.xxxxdb2.srv` on member`xxxxdb02` failed.
2013-04-29 03:41:45.820: [ CRSRES][11269]32ora.xxxxdb.xxxxdb1.xxxxdb2.srv failed on xxxxdb02 relocating.
2013-04-29 03:41:45.885: [ CRSRES][11269]32Cannot relocate ora.xxxxdb.xxxxdb1.xxxxdb2.srvStopping dependents
2013-04-29 03:41:45.897: [ CRSRES][11269]32StopResource: setting CLIvalues
2013-04-29 03:42:29.483: [ CRSRES][11263]32Stop of `ora.xxxxdb02.LISTENER_XXXXDB02.lsnr` on member`xxxxdb02` succeeded.
2013-04-29 03:42:29.496: [ CRSRES][11263]32Attempting to start`ora.xxxxdb02.vip` on member`xxdb01np5`
2013-04-29 03:42:32.036: [ CRSRES][11263]32Start of `ora.xxxxdb02.vip`on member `xxdb01np5`succeeded.
— explain : Node 2 is the fault node , The above information reflects , because VIP Detection exception leads to node 2 VIP Be strong
system OFFLINE. At the same time CRS Will be based on VIP Resource monitoring and database services are also carried out OFFLINE operation .
3、 ... and 、 analysis CRS state :
Because the customer kept the on-site environment very well during the failure . Therefore, we can analyze on the spot CRS
state .
Name Type Target State Host
————————————————————
ora.xxxxdb.db application ONLINE ONLINE xxdb01np5
ora….sdb1.cs application ONLINE OFFLINE
ora….db1.srv application ONLINE ONLINE xxdb01np5
ora….db2.srv application ONLINE ONLINE xxxxdb02
ora….b1.inst application ONLINE ONLINE xxdb01np5
ora….b2.inst application ONLINE ONLINE xxxxdb02
ora….P5.lsnr application ONLINE ONLINE xxdb01np5
ora….np5.gsd application ONLINE ONLINE xxdb01np5
ora….np5.ons application ONLINE ONLINE xxdb01np5
ora….np5.vip application ONLINE ONLINE xxdb01np5
ora….P5.lsnr application ONLINE OFFLINE
ora….np5.gsd application ONLINE ONLINE xxxxdb02
ora….np5.ons application ONLINE ONLINE xxxxdb02
ora….np5.vip application ONLINE ONLINE xxdb01np5
— explain : During the failure . Database on node 2 ( example ) Resources are still executing ,Nodeapp Normal resources ,
The only one who was OFFLINE Is a Service and listener resources .
But we noticed that on node 2 VIP resources
It has been taken over by node one . This explanation . Node II VIP(xxx.xxx.xxx.4 ) When troubleshooting problems
No problem. .
Four 、 Analysis database alarm log
Sun Apr 29 03:41:13 BEIST 2012
ALTER SYSTEM SET service_names=’xxxxdb’SCOPE=MEMORY SID=’xxxxdb2′;
Sun Apr 29 03:41:55 BEIST 2012
ALTER SYSTEM SET service_names=”SCOPE=MEMORY SID=’xxxxdb2′;
Sun Apr 29 03:41:55 BEIST 2012
Immediate Kill Session#: 1659, Serial#: 2
Immediate Kill Session: sess:70000038f773bb8 OS pid: 1180972
Sun Apr 29 03:41:55 BEIST 2012
Process OS id : 1180972 alive after kill
Errors in file
Immediate Kill Session#: 1660, Serial#: 2
Immediate Kill Session: sess:7000003847623c0 OS pid: 1179690
Sun Apr 29 03:41:56 BEIST 2012
Process OS id : 1179690 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1661, Serial#: 2
Immediate Kill Session: sess:7000003837620c0 OS pid: 1300264
Sun Apr 29 03:41:56 BEIST 2012
Process OS id : 1300264 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1662, Serial#: 2
Immediate Kill Session: sess:70000038275bc80 OS pid: 1155624
Sun Apr 29 03:41:57 BEIST 2012
Process OS id : 1155624 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1663, Serial#: 2
Immediate Kill Session: sess:700000387762ed0 OS pid: 1000484
Sun Apr 29 03:41:57 BEIST 2012
Process OS id : 1000484 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1664, Serial#: 2
Immediate Kill Session: sess:7000003867923d8 OS pid: 1175586
Sun Apr 29 03:41:58 BEIST 2012
Process OS id : 1175586 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1665, Serial#: 2
Immediate Kill Session: sess:7000003857a9d30 OS pid: 1296160
Sun Apr 29 03:41:58 BEIST 2012
Process OS id : 1296160 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1666, Serial#: 3
Immediate Kill Session: sess:70000038f775130 OS pid: 1176862
Sun Apr 29 03:41:59 BEIST 2012
Process OS id : 1176862 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1667, Serial#: 2
Immediate Kill Session: sess:700000384763938 OS pid: 1151516
Sun Apr 29 03:42:00 BEIST 2012
Process OS id : 1151516 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1668, Serial#: 2
Immediate Kill Session: sess:700000383763638 OS pid: 693786
Sun Apr 29 03:42:01 BEIST 2012
Process OS id : 693786 alive after kill
Errors in file/oracle/product/10.2.0/admin/xxxxdb/udump/xxxxdb2_ora_110596.trc
Immediate Kill Session#: 1669, Serial#: 2
Immediate Kill Session: sess:70000038275d1f8 OS pid: 1292056 6
— explain : During the failure , Database on node 2 Service By OFFLINE, This is normal .
But then CRS
Connect to all on node 2 client Connections are forced Kill It fell off . The question and Oracle Bug 6955040 Yes
Turn off .
5、 ... and 、 analysis RACGVIP
Because of the problem, locate VIP Resources are abnormal , and VIP Switching to node 1 is normal ,
So it is most likely because VIP An exception occurs during the inspection, resulting in CRS take VIP OFFLINE. because
Oracle CRS It's going on VIP The test passed RACGVIP This SHELL Script implementation function
Of , So we analyzed and tested RACGVIP Script .
We noticed one of the scripts , For example, the following :
#Check the status of the interface thro’ pinging gateway
if[ -n “$DEFAULTGW” ]
then
_RET=1
#get base IP address of the interface
tmpIP=`LSATTR -El {_IF} -a netaddr | AWK ‘{print 2}’`
#get RX packets numbers
_O1=`NETSTAT -n -I _IF | AWK “{ if (/^_IF/) {print \\
x=$CHECK_TIMES
while [ $x -gt 0 ]
do
if [ -n “$tmpIP” ]
then
PING -S tmpIP PING_TIMEOUT DEFAULTGW > /dev/null 2>&1
else
PING PING_TIMEOUT
fi
_O2=`NETSTAT -n -I _IF | AWK “{ if (/^_IF/) {print \\
if [ “_O1” != “_O2” ]
then
# RX packets numbers changed
_RET=0
break
fi
$SLEEP 1
x=`EXPR x – 1`
done
if [ $_RET -ne 0 ]
then
logx “checkIf: RX packets checked if=$_IF failed”
else
logx “checkIf: RX packets checked if=$_IF OK”
fi
else
logx “checkIf: Default gateway is not defined(host=$HOSTNAME)”
if [ $FAIL_WHEN_DEFAULTGW_NO_FOUND -eq 1 ]
then
_RET=1
else
_RET=0
fi
fi
— explain : This script explains .CRS Doing it VIP check The time mechanism is as follows , Periodically report to the default gateway (Default
Gateway) Contract awarding , Suppose packet traffic is detected on the network card before and after the contract is awarded (output The value is different ) Say
It is clear that the detection network is normal .VIP check adopt . The operation of contract awarding passes AIX The command runs , After restoring all variables
Commands such as the following
— ping –S xxx.xxx.xxx.2 –c 1–w 1 xxx.xxx.xxx.254
— The meaning of parameters is as follows :
-S hostname/IP addr
Uses the IP address as the source addressin outgoing ping packets.
-c Count
Specifies the number of echo requests, asindicated by the Count
variable, to be sent (and received).
-w timeout
This option works only with the -c option.It causes ping to wait
for a maximum of ‘timeout’ seconds for areply (after sending the
last packet).
— PING_TIMEOUT Parameters control -c and -w Value ,RACGVIP Set the following in the script, for example :
# timeout of ping in number of loops
PING_TIMEOUT=” -c 1 -w 1″
— We checked 10.2.0.3 Settings under version number , Now 10.2.0.3 Under version number ,PING_TIMEOUT Parameter settings
For example, the following values :
# timeout of ping in number of loops
PING_TIMEOUT=” -c 1 -w 3″
— At present PING_TIMEOUT The setting of means to assume that pingdefault gateway Of 1 There is no success to
Send out a packet , that CRS It will go through once SLEEP Then try again ping operation . Suppose it still doesn't work
work .CRS That is to say, the network is abnormal ,VIP OFFLINE.
Conclusion
Oracle 10.2.0.5 stay RACGVIP Description of minor changes in the script ,Oracle Think 10.2.0.5 Version number
The network performance and stability are higher .
But it will PING_TIMEOUT from 1 Second changed to 3 second , Will be big
Big add CRS The sensitivity of , Make subtle network anomalies will cause CRS Medium VIP Detection failed .
Based on the above analysis , We think the cause of node 2 failure is as follows :
1、 Upgrade to 10.2.0.5 after ,CRS More sensitive to network delay
2、 stay 4 month 29 Early morning 3 During point failure , The network appears and its short delay leads to VIP Detection
Failure ,CRS take VIP And rely on VIP Resources for OFFLINE.
3、 CRS stay OFFLINE Hit when serving the database Bug6955040 cause client Connected by
Kill all .
Fault recurrence
We simulated this abnormal situation in the production environment :
Step process
1 open CRS VIPdebug
2
Find VIP(xxx.xxx.xxx.4 ) Network devices used (en10)
And corresponding ServiceIP(xxx.xxx.xxx.2 )
3 Find the corresponding port
4 Block The port For a short time
5 Observe CRS And database logs
We have observed that CRS The log and database log information is completely consistent with that during the failure , this
It fully proves the previous inference .
At the same time .VIP Of debug The demonstration example of information display is as follows , We can see clearly from it , because
After the failure of contract awarding due to network problems ,CRS immediately VIP in OFFLINE state :
2013-05-01 20:36:22.560: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:18 BEIST 2012 [ 1139710 ] Checkinginterface existance
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] Calling getifbyip
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] getifbyip: started for xxx.xxx.xxx.4
2013-05-01 20:36:22.560: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:18 BEIST 2012 [ 1139710 ] getifbyip:checking if failover is happening (en10)
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] getifbyip: failover is not happening
(en10)
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] getifbyip: returning IP en10
2013-05-01 20:36:22.560: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:18 BEIST 2012 [ 1139710 ] Completedgetifbyip en10
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] Completed with initial interface test
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] Broadcast = 130.0.255.255
2013-05-01 20:36:22.560: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:18 BEIST 2012 [ 1139710 ] checkIf:start for if=en10
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] IsIfAlive: start for if=en10
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] defaultgw: started
2013-05-01 20:36:22.560: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:18 BEIST 2012 [ 1139710 ]defaultgw: completed withxxx.xxx.xxx.254
Tue May 1 20:36:18 BEIST 2012 [ 1139710 ] About to execute com mand:/usr/sbin/ping
-S xxx.xxx.xxx.2 -c 1 -w 1 xxx.xxx.xxx.254
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:20 BEIST 2012 [ 1139710 ] About toexecute command: /usr/sbin/ping -S
xxx.xxx.xxx.2 -c 1 -w 1 xxx.xxx.xxx.254
Tue May 1 20:36:22 BEIST 2012 [ 1139710 ] IsIfAlive: RX packets checked if=en10 failed
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:22 BEIST 2012 [ 1139710 ] Interfaceen10 checked failed (host=xxxxdb02)
Tue May 1 20:36:22 BEIST 2012 [ 1139710 ] IsIfAlive: end for if=en10
Tue May 1 20:36:22 BEIST 2012 [ 1139710 ] checkIf: end for if=en10
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:22 BEIST 2012 [ 1139710 ] PerformingCRS_STAT testing
Tue May 1 20:36:22 BEIST 2012 [ 1139710 ] Completed CRS_STAT testing
Tue May 1 20:36:22 BEIST 2012 [ 1139710 ] Completed second gateway test
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:22 BEIST 2012 [ 1139710 ] Interfacetests Invalid parameters, or failed to bring upVIP (host=xxxxdb02)
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcexecut: envORACLE_CONFIG_HOME=/oracle/product/10.2.0/crs_1
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcexecut: cmd =/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=5 54
/oracle/product/10.2.0/crs_1/bin/racgvipcheck xxxxdb02
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcexecut: rc = 1, time = 4.304s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcaction: restyp=0 act_typ=2 stat=1
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcaction: init 0.000s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb0 2.vip]:
clsrcaction: action failed, 4.438s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcaction: post 0.000s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcaction: all 4.438s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:end for
resource = ora.xxxxdb02.vip, action =check, status = 1, time = 4.442s
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:clsrccln:
exiting ora.xxxxdb02.vip refcount=1
2013-05-01 20:36:22.561: [ RACG][1] [1291828][1][ora.xxxxdb02.vip]:
clsrcprsrgter:gctx->prsrcfgref_clsrcgctx = 0
2013-05-01 20:36:22.664: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcprsrgini:gctx->prsrcfgref_clsrcgctx = 0
2013-05-01 20:36:22.664: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcini_ext: starting ora.xxxxdb02.viprefcount=1 global
2013-05-01 20:36:22.665: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:begin for
resource = ora.xxxxdb02.vip, action = stop
2013-05-01 20:36:22.902: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrccssgetnodename: all 0.238s
2013-05-01 20:36:22.902: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcnodeapps: calling FAILSRVSA
2013-05-01 20:36:22.907: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:
clsrcrundetach: cmd =/oracle/product/10.2.0/crs_1/bin/racgmain ora.xxxxdb02.vip
rundetach 1 failsrvsa xxxxdb02, rc = 0, time = 0.005s
2013-05-01 20:36:22.908: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcnodeapps: Posting PNWDOWN_EVENT
2013-05-01 20:36:22.908: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrccssgetnodename: all 0.000s
2013-05-01 20:36:22.908: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrccssgetnodename: all 0.000s
2013-05-01 20:36:22.908: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcssgetrhost: using cached local hostname
2013-05-01 20:36:22.909: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrccssgetlhost: all 0.001s
2013-05-01 20:36:22.935: [ RACG][1] [889364][1][ora.xxxxdb02.vip]:
clsrcprsrgini:gctx->prsrcfgref_clsrcgctx = 0
2013-05-01 20:36:22.935: [ RACG][1] [889364][1][ora.xxxxdb02.vip]:
clsrcini_ext: starting ora.xxxxdb02.viprefcount=1 global
2013-05-01 20:36:22.935: [ RACG][1] [889364][1][ora.xxxxdb02.vip]:begin for
resource = ora.xxxxdb02.vip, action =rundetach
2013-05-01 20:36:22.964: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_COUNT=0
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = NAME=ora.xxxxdb.xxxxdb1.xxxxdb2.srv
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = FAILURE_THRESHOLD=0
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TARGET=ONLINE
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_COUNT=0
2013-05-01 20:36:22.982: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = FAILURE_THRESHOLD=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TARGET=ONLINE
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_ATTEMPTS=5
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = NAME=ora.xxxxdb02.ons
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TYPE=application
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_ATTEMPTS=3
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_COUNT=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = FAILURE_THRESHOLD=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TARGET=ONLINE
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02 15
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = NAME=ora.xxxxdb02.vip
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TYPE=application
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_ATTEMPTS=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_COUNT=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = FAILURE_THRESHOLD=0
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TARGET=ONLINE
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2013-05-01 20:36:22.983: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcqryapi:
resname = NULL, host = xxxxdb02, time =0.007s
2013-05-01 20:36:23.075: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcpostevt: EvmEventPost 1 0.000s
2013-05-01 20:36:23.075: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcposthaevt: forward to EVM 0.166s
2013-05-01 20:36:23.080: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcnodeapp: prsr num_env = 0
2013-05-01 20:36:23.080: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcnodeapp: settingORACLE_CONFIG_HOME=/oracle/product/10.2.0/crs_1
2013-05-01 20:36:23.111: [ RACG][1] [889364][1][ora.xxxxdb02.vip]:
clsrcpostevt: EvmEventPost 1 0.008s
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]: clsrcstatcb:
buf = NAME=ora.xxxxdb02.vip
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TYPE=application
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_ATTEMPTS=0
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = RESTART_COUNT=0
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = FAILURE_THRESHOLD=0
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = TARGET=ONLINE
2013-05-01 20:36:23.114: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf = STATE=ONLINE on xxxxdb02
2013-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcstatcb:
buf =
2013-05-01 20:36:23.115: [ RACG][1] [889364][1][ora.xxxxdb02 .vip]:clsrcqryapi:
resname = ora.xxxxdb02.vip, host = NULL,time = 0.004s
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] Checkinginterface existance
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Calling getifbyip
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: started for xxx.xxx.xxx.4
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] getifbyip:checking if failover is happening ()
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] getifbyip: failover i s not happening()
Tue May 1 20:36:23 BEIST 2012 [ 921812 ] Completed getifbyip
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:Tue May 1
20:36:23 BEIST 2012 [ 921812 ] Completedwith initial interface test
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcexecut: envORACLE_CONFIG_HOME=/oracle/product/10.2.0/crs_1
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb0 2.vip]:
clsrcexecut: cmd =/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=5 54
/oracle/product/10.2.0/crs_1/bin/racgvipstop xxxxdb02
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcexecut: rc = 0, time = 0.204s
2013-05-01 20:36:23.284: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcposthaevt: reason = failure
2013-05-01 20:36:23.285: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:clsrccln:
exiting ora.xxxxdb02.vip refcount=1
2013-05-01 20:36:23.286: [ RACG][1] [1115450][1][ora.xxxxdb02.vip]:
clsrcprsrgter:gctx->prsrcfgref_clsrcgctx = 0
Solution
According to the analysis results , We feel ,10.2.0.5 in CRS Too sensitive to the network , Network delay occurs
It will have a great impact on the database cluster . In view of the current situation . We suggest, for example, the following :
One 、 Detailed investigation of network problems , Accidental packet loss or delay is also common at the network level .
It may be a cable problem . It could also be a switch 、server network card 、 Network configuration, etc
problem , You need to check the network specifically .
Two 、 Changes are too sensitive CRS To configure , Set the contract issuing timeout to 3 second ( 10.2.0.5 Previous
value ):
changes $ORA_CRS_HOME/bin/racgvip Scripts such as the following
# timeout of ping in number of loops
PING_TIMEOUT=” -c 1 -w 1″
Examples of changes are as follows :
# timeout of ping in number of loops
PING_TIMEOUT=” -c 1 -w 3″
3、 ... and 、 because Bug 6955040 yes VIP Triggered after exception . At present, priority should be given to VIP Ask abnormally
topic . The Bug Can ignore .
Copyright notice : This article is the original article of the blogger , Blog , Do not reprint without permission .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/117072.html Link to the original text :https://javaforall.cn
边栏推荐
- 用aardio写一个旋转验证码标注小工具
- 保存和检索字符串
- 1292_FreeROS中vTaskResume()以及xTaskResumeFromISR()的实现分析
- Basic introduction of figure
- Codeforces Round #274 (Div. 2) –A Expression
- 设置状态栏样式Demo
- npm run dev启动项目报错 document is not defined
- Sparkshuffle process and Mr shuffle process
- 50个常用的Numpy函数解释,参数和使用示例
- Redistemplate common collection instructions opsforlist (III)
猜你喜欢

Five wars of Chinese Baijiu

JS method to stop foreach
![[Li Kou brushing questions] one dimensional dynamic planning record (53 change exchanges, 300 longest increasing subsequence, 53 largest subarray and)](/img/1c/973f824f061d470a4079487d75f0d0.png)
[Li Kou brushing questions] one dimensional dynamic planning record (53 change exchanges, 300 longest increasing subsequence, 53 largest subarray and)

Enhance network security of kubernetes with cilium

C# 如何在dataGridView里设置两个列comboboxcolumn绑定级联事件的一个二级联动效果

LeetCode学习记录(从新手村出发之杀不出新手村)----1

LeetCode:1189. The maximum number of "balloons" -- simple

PostgreSQL 安装gis插件 CREATE EXTENSION postgis_topology

Yuan Xiaolin: safety is not only a standard, but also Volvo's unchanging belief and pursuit

红杉中国,刚刚募资90亿美元
随机推荐
The underlying implementation of string
Earned value management EVM detailed explanation and application, example explanation
Reptile practice (V): climbing watercress top250
Aggregate function with key in spark
First batch selected! Tencent security tianyufeng control has obtained the business security capability certification of the ICT Institute
【sciter Bug篇】多行隐藏
数字化转型挂帅复产复工,线上线下全融合重建商业逻辑
保存和检索字符串
Leveldb source code analysis series - main process
SQL:存储过程和触发器~笔记
Technology sharing | packet capturing analysis TCP protocol
Broadcast variables and accumulators in spark
Tiktok will push the independent grass planting app "praiseworthy". Can't bytes forget the little red book?
Quick access to video links at station B
Five wars of Chinese Baijiu
1292_FreeROS中vTaskResume()以及xTaskResumeFromISR()的实现分析
Run the deep network on PI and Jetson nano, and the program is killed
[asp.net core] set the format of Web API response data -- formatfilter feature
Explain ESM module and commonjs module in simple terms
JPEG2000 matlab source code implementation