这篇文章主要讲解了“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理方法是什么”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理方法是什么”吧!

最近一个月单位上的Oracle RAC集群CRS异常导致集群命令无法使用,执行crsctl stat res -t命令出现如下错误:

[grid@db1~]$crsctlstatres-tCRS-4535:CannotcommunicatewithClusterReadyServicesCRS-4000:CommandStatusfailed,orcompletedwitherrors.[grid@db2~]$crsctlstatres-tCRS-4535:CannotcommunicatewithClusterReadyServicesCRS-4000:CommandStatusfailed,orcompletedwitherrors.

但是数据库可以正常访问,业务系统也运行正常

[root@db1~]#ps-ef|greppmonroot8024204594021:11pts/000:00:00greppmongrid7712010Dec21?00:04:21asm_pmon_+ASM1oracle7779010Dec21?00:05:18ora_pmon_CAIWU1oracle7779410Dec21?00:05:08ora_pmon_dadb1oracle7784810Dec21?00:05:39ora_pmon_chdyl1oracle7791010Dec21?00:07:47ora_pmon_RLZY1[root@db2~]#ps-ef|greppmongrid2774510Dec21?00:04:21asm_pmon_+ASM2oracle2839310Dec21?00:05:21ora_pmon_dadb2oracle2856910Dec21?00:04:58ora_pmon_CAIWU2oracle2857310Dec21?00:05:36ora_pmon_chdyl2oracle2858310Dec21?00:07:49ora_pmon_RLZY2

查看ASM磁盘组的状态,发现OCR磁盘组确实offline了

[grid@db1~]$asmcmdlsdgStateTypeRebalSectorBlockAUTotal_MBFree_MBReq_mir_free_MBUsable_file_MBOffline_disksVoting_filesNameMOUNTEDEXTERNN5124096104857630720002679522026795220NARCH/MOUNTEDEXTERNN5124096104857620480015113801511380NCWDATA/MOUNTEDEXTERNN5124096104857651200047254604725460NDADATA/MOUNTEDEXTERNN51240961048576307200059533405953340NDATA/MOUNTEDEXTERNN51240961048576184320060995306099530NSBDATA/[grid@db2~]$asmcmdlsdgStateTypeRebalSectorBlockAUTotal_MBFree_MBReq_mir_free_MBUsable_file_MBOffline_disksVoting_filesNameMOUNTEDEXTERNN5124096104857630720002679522026795220NARCH/MOUNTEDEXTERNN5124096104857620480015113801511380NCWDATA/MOUNTEDEXTERNN5124096104857651200047254604725460NDADATA/MOUNTEDEXTERNN51240961048576307200059533405953340NDATA/MOUNTEDEXTERNN51240961048576184320060995306099530NSBDATA/

手工将crsdg上线,命令能够执行成功,但执行crsctl stat res -t命令仍然报错。

[grid@db1~]$sqlplus/assysasmSQL*Plus:Release11.2.0.4.0ProductiononMonDec3021:15:332019Copyright(c)1982,2013,Oracle.Allrightsreserved.Connectedto:OracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththeRealApplicationClustersandAutomaticStorageManagementoptionsSQL>alterdiskgroupocrmount;Diskgroupaltered.SQL>exitDisconnectedfromOracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththeRealApplicationClustersandAutomaticStorageManagementoptions[grid@db1~]$crsctlstatres-tCRS-4535:CannotcommunicatewithClusterReadyServicesCRS-4000:CommandStatusfailed,orcompletedwitherrors.[grid@db2~]$sqlplus/assysasmSQL*Plus:Release11.2.0.4.0ProductiononMonDec3021:15:052019Copyright(c)1982,2013,Oracle.Allrightsreserved.Connectedto:OracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththeRealApplicationClustersandAutomaticStorageManagementoptionsSQL>alterdiskgroupocrmount;Diskgroupaltered.SQL>exitDisconnectedfromOracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththeRealApplicationClustersandAutomaticStorageManagementoptions[grid@db2~]$crsctlstatres-tCRS-4535:CannotcommunicatewithClusterReadyServicesCRS-4000:CommandStatusfailed,orcompletedwitherrors.

检查节点db1的alert_+ASM1.log有如下报错,说是不能访问OCR磁盘组的相关磁盘而强制dismount了OCR磁盘,但使用dd命令测试是能够访问的

Errorsinfile/u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:ORA-15078:ASMdiskgroupwasforciblydismountedWARNING:requestedmirrorside1ofvirtualextent0logicalextent0offset102400isnotallocated;I/OrequestfailedWARNING:requestedmirrorside2ofvirtualextent0logicalextent1offset102400isnotallocated;I/OrequestfailedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:ORA-15078:ASMdiskgroupwasforciblydismountedORA-15078:ASMdiskgroupwasforciblydismountedSatDec2805:30:482019SQL>alterdiskgroupOCRcheck/*proxy*/ORA-15032:notallalterationsperformedORA-15001:diskgroup"OCR"doesnotexistorisnotmountedERROR:alterdiskgroupOCRcheck/*proxy*/NOTE:clientexited[77184]SatDec2805:30:492019NOTE:[crsd.bin@db1(TNSV1-V3)35285]openingOCRfileSatDec2805:30:512019NOTE:[crsd.bin@db1(TNSV1-V3)35305]openingOCRfileSatDec2805:30:532019NOTE:[crsd.bin@db1(TNSV1-V3)35322]openingOCRfileSatDec2805:30:552019NOTE:[crsd.bin@db1(TNSV1-V3)35346]openingOCRfileSatDec2805:30:572019NOTE:[crsd.bin@db1(TNSV1-V3)35363]openingOCRfileSatDec2805:31:002019NOTE:[crsd.bin@db1(TNSV1-V3)35459]openingOCRfileSatDec2805:31:022019NOTE:[crsd.bin@db1(TNSV1-V3)35481]openingOCRfileSatDec2805:31:042019NOTE:[crsd.bin@db1(TNSV1-V3)35520]openingOCRfileSatDec2805:31:062019NOTE:[crsd.bin@db1(TNSV1-V3)35539]openingOCRfileSatDec2805:31:082019NOTE:[crsd.bin@db1(TNSV1-V3)35557]openingOCRfileSatDec2821:00:102019Warning:VKTMdetectedatimedrift.Timedriftscanresultinanunexpectedbehaviorsuchastime-outs.Pleasechecktracefileformoredetails.

检查错误日志

[root@db1~]#more/u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trcTracefile/u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trcOracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththeRealApplicationClustersandAutomaticStorageManagementoptionsORACLE_HOME=/u01/app/11.2.0/gridSystemname:LinuxNodename:db1Release:3.8.13-68.3.4.el6uek.x86_64Version:#2SMPTueJul1415:03:36PDT2015Machine:x86_64Instancename:+ASM1Redothreadmountedbythisinstance:0Oracleprocessnumber:24Unixprocesspid:77212,image:oracle@db1(TNSV1-V3)***2019-12-2805:30:44.894***SESSIONID:(2929.3)2019-12-2805:30:44.894***CLIENTID:()2019-12-2805:30:44.894***SERVICENAME:()2019-12-2805:30:44.894***MODULENAME:(crsd.bin@db1(TNSV1-V3))2019-12-2805:30:44.894***ACTIONNAME:()2019-12-2805:30:44.894ReceivedORADEBUGcommand(#1)'CLEANUP_KFK_FD'fromprocess'Unixprocesspid:35253,image:'***2019-12-2805:30:44.895FinishedprocessingORADEBUGcommand(#1)'CLEANUP_KFK_FD'***2019-12-2805:30:48.235WARNING:failedxlate1ORA-15078:ASMdiskgroupwasforciblydismountedksfdrfms:MirrorReadfile=+OCR.255.4294967295fob=0x9b00e5d8bufp=0x7f5dd012ba00blkno=25nbytes=4096WARNING:failedxlate1WARNING:requestedmirrorside1ofvirtualextent0logicalextent0offset102400isnotallocated;I/Orequestfailedksfdrfms:Readfailedfrommirrorside=1logicalextentnumber=0dskno=65535WARNING:failedxlate1WARNING:requestedmirrorside2ofvirtualextent0logicalextent1offset102400isnotallocated;I/Orequestfailedksfdrfms:Readfailedfrommirrorside=2logicalextentnumber=1dskno=65535ORA-15078:ASMdiskgroupwasforciblydismountedORA-15078:ASMdiskgroupwasforciblydismounted

检查节点db1的alertdb1.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘

2019-12-2805:30:48.468:[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/oraagent_grid'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:1:4}in/u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.log.2019-12-2805:30:48.468:[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/oraagent_oracle'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:7:332}in/u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_oracle.log.2019-12-2805:30:48.471:[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/orarootagent_root'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:5:11497}in/u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/orarootagent_root.log.2019-12-2805:30:48.480:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:30:50.003:[crsd(35285)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:50.021:[crsd(35285)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:50.520:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:30:51.918:[crsd(35305)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:51.929:[crsd(35305)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:52.557:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:30:53.945:[crsd(35322)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:53.956:[crsd(35322)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:54.595:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:30:55.976:[crsd(35346)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:55.988:[crsd(35346)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:56.633:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:30:58.010:[crsd(35363)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:58.020:[crsd(35363)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:30:58.669:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:00.043:[crsd(35459)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:00.054:[crsd(35459)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:00.706:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:02.093:[crsd(35481)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:02.103:[crsd(35481)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:02.742:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:04.109:[crsd(35520)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:04.119:[crsd(35520)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:04.777:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:06.141:[crsd(35539)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:06.151:[crsd(35539)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:06.810:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:08.181:[crsd(35557)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:08.191:[crsd(35557)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db1/crsd/crsd.log.2019-12-2805:31:08.846:[ohasd(33022)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db1'.2019-12-2805:31:08.847:[ohasd(33022)]CRS-2771:Maximumrestartattemptsreachedforresource'ora.crsd';willnotrestart.2019-12-2805:31:08.848:[ohasd(33022)]CRS-2769:Unabletofailoverresource'ora.crsd'.

检查节点db1的oraagent_grid.log文件有如下报错,显示OCR磁盘组的状态被改变为offline了。

2019-12-2805:30:16.531:[AGFW][511039232]{1:30746:2}Agentreceivedthemessage:AGENT_HB[Engine]ID12293:1137202019-12-2805:30:37.808:[AGFW][511039232]{1:30746:9373}Agentreceivedthemessage:RESOURCE_STOP[ora.OCR.dgdb11]ID4099:1137302019-12-2805:30:37.808:[AGFW][511039232]{1:30746:9373}PreparingSTOPcommandfor:ora.OCR.dgdb112019-12-2805:30:37.808:[AGFW][511039232]{1:30746:9373}ora.OCR.dgdb11statechangedfrom:ONLINEto:STOPPING2019-12-2805:30:37.809:[ora.OCR.dg][513140480]{1:30746:9373}[stop](:CLSN00108:)clsn_agent::stop{2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop]DgpAgent::stop:enter{2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop]getResAttrib:attribnameUSR_ORA_OPIvaluetruelen42019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop]Agent::flagUsrOraOpiIsSet(true)reasonnotdependency2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop]DgpAgent::stop:thaexit}2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop]DgpAgent::stopSinglestatus:2}2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[stop](:CLSN00108:)clsn_agent::stop}2019-12-2805:30:37.810:[AGFW][513140480]{1:30746:9373}Command:stopforresource:ora.OCR.dgdb11completedwithstatus:SUCCESS2019-12-2805:30:37.810:[ora.OCR.dg][513140480]{1:30746:9373}[check]CrsCmd::ClscrsCmdData::statentity1statflag33useFilter02019-12-2805:30:37.811:[AGFW][511039232]{1:30746:9373}Agentsendingreplyfor:RESOURCE_STOP[ora.OCR.dgdb11]ID4099:1137302019-12-2805:30:37.838:[ora.OCR.dg][513140480]{1:30746:9373}[check]DgpAgent::runCheck:asmstatasmRet02019-12-2805:30:37.839:[ora.OCR.dg][513140480]{1:30746:9373}[check]DgpAgent::getConnxnconnected2019-12-2805:30:37.844:[ora.OCR.dg][513140480]{1:30746:9373}[check]DgpAgent::queryDgStatusexcpnodatafound2019-12-2805:30:37.844:[ora.OCR.dg][513140480]{1:30746:9373}[check]DgpAgent::queryDgStatusnodatafoundinv$asm_diskgroup_stat2019-12-2805:30:37.844:[ora.OCR.dg][513140480]{1:30746:9373}[check]DgpAgent::queryDgStatusdgNameOCRret12019-12-2805:30:37.845:[AGFW][511039232]{1:30746:9373}ora.OCR.dgdb11statechangedfrom:STOPPINGto:OFFLINE2019-12-2805:30:37.845:[AGFW][511039232]{1:30746:9373}Agentsendinglastreplyfor:RESOURCE_STOP[ora.OCR.dgdb11]ID4099:1137302019-12-2805:30:43.889:[ora.asm][503641856]{1:30746:2}[check]CrsCmd::ClscrsCmdData::statentity1statflag33useFilter02019-12-2805:30:43.920:[ora.asm][503641856]{1:30746:2}[check]AsmProxyAgent::checkclsagfw_res_status02019-12-2805:30:48.465:[CRSCOMM][521545472]IpcC:IPCclientconnection6ctomember0hasbeenremoved2019-12-2805:30:48.465:[CLSFRAME][521545472]RemovingIPCMember:{Relative|Node:0|Process:0|Type:1}2019-12-2805:30:48.465:[CLSFRAME][521545472]DisconnectedfromCRSD:db1process:{Relative|Node:0|Process:0|Type:1}2019-12-2805:30:48.474:[AGENT][511039232]{0:1:4}{0:1:4}Createdalert:(:CRSAGF00117:):Disconnectedfromserver,Agentisshuttingdown.2019-12-2805:30:48.474:[AGFW][511039232]{0:1:4}Agentisexitingwithexitcode:1

检查节点db2的alert_+ASM2.log有如下报错,出现了类似的"Waited 15 secs for write IO to PST disk 0 in group 1"信息,这说明对OCR磁盘组执行写操作时超时了15秒.

SatDec2803:02:512019WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup3.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup3.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk1ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk2ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk1ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk2ingroup5.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup6.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup6.SatDec2803:02:512019NOTE:process_b000_+asm2(45488)initiatingofflineofdisk0.3916160907(OCR1)withmask0x7eingroup5NOTE:process_b000_+asm2(45488)initiatingofflineofdisk1.3916160906(OCR2)withmask0x7eingroup5NOTE:process_b000_+asm2(45488)initiatingofflineofdisk2.3916160905(OCR3)withmask0x7eingroup5NOTE:checkingPST:grp=5GMONcheckingdiskmodesforgroup5at19forpid27,osid45488ERROR:noreadquorumingroup:required2,found0disksNOTE:checkingPSTforgrp5done.NOTE:initiatingPSTupdate:grp=5,dsk=0/0xe96bdf8b,mask=0x6a,op=clearNOTE:initiatingPSTupdate:grp=5,dsk=1/0xe96bdf8a,mask=0x6a,op=clearNOTE:initiatingPSTupdate:grp=5,dsk=2/0xe96bdf89,mask=0x6a,op=clearGMONupdatingdiskmodesforgroup5at20forpid27,osid45488ERROR:noreadquorumingroup:required2,found0disksSatDec2803:02:512019NOTE:cachedismounting(notclean)group5/0x8F5B2F9F(OCR)NOTE:messagingCKPTtoquiescepinsUnixprocesspid:45490,image:oracle@db2(B001)SatDec2803:02:512019NOTE:haltingallI/Ostodiskgroup5(OCR)SatDec2803:02:522019NOTE:LGWRdoingnon-cleandismountofgroup5(OCR)NOTE:LGWRsyncABA=23.100lastwrittenABA23.100WARNING:OfflinefordiskOCR1inmode0x7ffailed.WARNING:OfflinefordiskOCR2inmode0x7ffailed.WARNING:OfflinefordiskOCR3inmode0x7ffailed.SatDec2803:02:522019kjbdomdetsendtoinst1detachfromdom5,sendingdetachmessagetoinst1SatDec2803:02:522019Listofinstances:12Dirtydetachreconfigurationstarted(newddetinc1,clusterinc36)GlobalResourceDirectorypartiallyfrozenfordirtydetach*dirtydetach-domain5invalid=TRUE0GCSresourcestraversed,0cancelledDirtyDetachReconfigurationcompleteSatDec2803:02:522019WARNING:dirtydetachedfromdomain5NOTE:cachedismountedgroup5/0x8F5B2F9F(OCR)SQL>alterdiskgroupOCRdismountforce/*ASMSERVER:2405117855*/SatDec2803:02:522019NOTE:cachedeletingcontextforgroupOCR5/0x8f5b2f9fGMONdismountinggroup5at21forpid28,osid45490NOTE:DiskOCR1inmode0x7fmarkedforde-assignmentNOTE:DiskOCR2inmode0x7fmarkedforde-assignmentNOTE:DiskOCR3inmode0x7fmarkedforde-assignmentNOTE:Waitingforallpendingwritestocompletebeforede-registering:grpnum5SatDec2803:03:032019WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited24secsforwriteIOtoPSTdisk0ingroup2.WARNING:Waited24secsforwriteIOtoPSTdisk0ingroup2.WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup3.WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup3.WARNING:Waited21secsforwriteIOtoPSTdisk0ingroup4.WARNING:Waited21secsforwriteIOtoPSTdisk0ingroup4.WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup6.WARNING:Waited27secsforwriteIOtoPSTdisk0ingroup6.SatDec2803:03:032019ASMHealthCheckerfound1newfailuresSatDec2803:03:222019SUCCESS:diskgroupOCRwasdismountedSUCCESS:alterdiskgroupOCRdismountforce/*ASMSERVER:2405117855*/SUCCESS:ASM-initiatedMANDATORYDISMOUNTofgroupOCRSatDec2803:03:222019NOTE:diskgroupresourceora.OCR.dgisofflineSatDec2803:03:222019Errorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedSatDec2805:30:342019WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup6.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup6.SatDec2805:30:372019Receiveddirtydetachmsgfrominst1fordom5SatDec2805:30:372019Listofinstances:12Dirtydetachreconfigurationstarted(newddetinc2,clusterinc36)GlobalResourceDirectorypartiallyfrozenfordirtydetach*dirtydetach-domain5invalid=TRUE0GCSresourcestraversed,0cancelledfreeingrdom5DirtyDetachReconfigurationcompleteSatDec2805:30:372019Errorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedWARNING:requestedmirrorside1ofvirtualextent5logicalextent0offset704512isnotallocated;I/OrequestfailedWARNING:requestedmirrorside2ofvirtualextent5logicalextent1offset704512isnotallocated;I/OrequestfailedErrorsinfile/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:ORA-15078:ASMdiskgroupwasforciblydismountedORA-15078:ASMdiskgroupwasforciblydismountedSatDec2805:30:372019SQL>alterdiskgroupOCRcheck/*proxy*/ORA-15032:notallalterationsperformedORA-15001:diskgroup"OCR"doesnotexistorisnotmountedERROR:alterdiskgroupOCRcheck/*proxy*/SatDec2805:30:442019WARNING:Waited20secsforwriteIOtoPSTdisk0ingroup2.WARNING:Waited20secsforwriteIOtoPSTdisk0ingroup2.SatDec2805:30:482019NOTE:clientexited[27819]SatDec2805:30:492019NOTE:[crsd.bin@db2(TNSV1-V3)142641]openingOCRfileSatDec2805:30:512019NOTE:[crsd.bin@db2(TNSV1-V3)142660]openingOCRfileSatDec2805:30:532019NOTE:[crsd.bin@db2(TNSV1-V3)142678]openingOCRfileSatDec2805:30:552019NOTE:[crsd.bin@db2(TNSV1-V3)142696]openingOCRfileSatDec2805:30:572019NOTE:[crsd.bin@db2(TNSV1-V3)142723]openingOCRfileSatDec2805:30:592019NOTE:[crsd.bin@db2(TNSV1-V3)142744]openingOCRfileSatDec2805:31:012019NOTE:[crsd.bin@db2(TNSV1-V3)142773]openingOCRfileSatDec2805:31:032019NOTE:[crsd.bin@db2(TNSV1-V3)142792]openingOCRfileSatDec2805:31:052019NOTE:[crsd.bin@db2(TNSV1-V3)142806]openingOCRfileSatDec2805:31:072019NOTE:[crsd.bin@db2(TNSV1-V3)142821]openingOCRfileSatDec2806:18:422019WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup1.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup2.WARNING:Waited15secsforwriteIOtoPSTdisk0ingroup2.

检查节点db2的alertdb2.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘,但使用dd命令测试是可以访问的。

2019-12-2805:30:48.019:[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/oraagent_oracle'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:7:73}in/u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.2019-12-2805:30:48.019:[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/scriptagent_grid'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:9:8}in/u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.2019-12-2805:30:48.020:[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/oraagent_grid'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:1:10}in/u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.2019-12-2805:30:48.021:[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent'/u01/app/11.2.0/grid/bin/orarootagent_root'disconnectedfromserver.Detailsat(:CRSAGF00117:){0:5:373}in/u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.2019-12-2805:30:48.024:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:49.410:[crsd(142641)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:49.420:[crsd(142641)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:50.063:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:51.442:[crsd(142660)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:51.451:[crsd(142660)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:52.100:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:53.471:[crsd(142678)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:53.480:[crsd(142678)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:54.138:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:55.507:[crsd(142696)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:55.517:[crsd(142696)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:56.176:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:57.551:[crsd(142723)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:57.560:[crsd(142723)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:58.216:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:30:59.592:[crsd(142744)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:30:59.602:[crsd(142744)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:00.253:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:31:01.627:[crsd(142773)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:01.636:[crsd(142773)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:02.290:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:31:03.658:[crsd(142792)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:03.668:[crsd(142792)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:04.327:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:31:05.701:[crsd(142806)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:05.711:[crsd(142806)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:06.365:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:31:07.726:[crsd(142821)]CRS-1013:TheOCRlocationinanASMdiskgroupisinaccessible.Detailsin/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:07.735:[crsd(142821)]CRS-0804:ClusterReadyServiceabortedduetoOracleClusterRegistryerror[PROC-26:Errorwhileaccessingthephysicalstorage].Detailsat(:CRSD00111:)in/u01/app/11.2.0/grid/log/db2/crsd/crsd.log.2019-12-2805:31:08.402:[ohasd(13034)]CRS-2765:Resource'ora.crsd'hasfailedonserver'db2'.2019-12-2805:31:08.402:[ohasd(13034)]CRS-2771:Maximumrestartattemptsreachedforresource'ora.crsd';willnotrestart.2019-12-2805:31:08.403:[ohasd(13034)]CRS-2769:Unabletofailoverresource'ora.crsd'.

检查节点db2的oraagent_grid.log文件有如下报错:

2019-12-2805:29:59.329:[AGFW][3601811200]{2:6928:2}Agentreceivedthemessage:AGENT_HB[Engine]ID12293:2733042019-12-2805:30:17.162:[ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403}[check]Utils:execCmdaction=3flags=38ohome=(null)cmdname=lsnrctl.2019-12-2805:30:17.267:[ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403}[check]execCmdret=02019-12-2805:30:17.267:[ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403}[check]CrsCmd::ClscrsCmdData::statentity5statflag32useFilter12019-12-2805:30:17.298:[ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403}[check]ScanLsnrAgent::checkDependentVipResource:scanVipResource=ora.scan2.vip,statRet=02019-12-2805:30:17.881:[ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403}[check]Utils:execCmdaction=3flags=38ohome=(null)cmdname=lsnrctl.2019-12-2805:30:17.986:[ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403}[check]execCmdret=02019-12-2805:30:17.987:[ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403}[check]CrsCmd::ClscrsCmdData::statentity5statflag32useFilter12019-12-2805:30:18.019:[ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403}[check]ScanLsnrAgent::checkDependentVipResource:scanVipResource=ora.scan3.vip,statRet=02019-12-2805:30:27.292:[ora.asm][2950686464]{2:6928:2}[check]CrsCmd::ClscrsCmdData::statentity1statflag33useFilter02019-12-2805:30:27.319:[ora.asm][2950686464]{2:6928:2}[check]AsmProxyAgent::checkclsagfw_res_status02019-12-2805:30:34.522:[ora.ons][2950686464]{2:6928:2}[check]getOracleHomeAttrib:oracle_home=/u01/app/11.2.0/grid2019-12-2805:30:34.522:[ora.ons][2950686464]{2:6928:2}[check]Utils:execCmdaction=3flags=6ohome=/u01/app/11.2.0/grid/opmn/cmdname=onsctli.2019-12-2805:30:34.627:[ora.ons][2950686464]{2:6928:2}[check](:CLSN00010:)onsisrunning...2019-12-2805:30:34.627:[ora.ons][2950686464]{2:6928:2}[check](:CLSN00010:)2019-12-2805:30:34.628:[ora.ons][2950686464]{2:6928:2}[check]execCmdret=02019-12-2805:30:37.858:[USRTHRD][3575973632]{1:30748:9373}ProcessingtheeventCRS_RESOURCE_STATE_CHANGE2019-12-2805:30:38.652:[ora.LISTENER.lsnr][3594413824]{2:6928:2}[check]Utils:execCmdaction=3flags=38ohome=(null)cmdname=lsnrctl.2019-12-2805:30:38.757:[ora.LISTENER.lsnr][3594413824]{2:6928:2}[check]execCmdret=02019-12-2805:30:48.017:[CRSCOMM][3612317440]IpcC:IPCclientconnection6ctomember0hasbeenremoved2019-12-2805:30:48.017:[CLSFRAME][3612317440]RemovingIPCMember:{Relative|Node:0|Process:0|Type:1}2019-12-2805:30:48.017:[CLSFRAME][3612317440]DisconnectedfromCRSD:db2process:{Relative|Node:0|Process:0|Type:1}2019-12-2805:30:48.020:[AGENT][3601811200]{0:1:10}{0:1:10}Createdalert:(:CRSAGF00117:):Disconnectedfromserver,Agentisshuttingdown.2019-12-2805:30:48.020:[AGFW][3601811200]{0:1:10}Agentisexitingwithexitcode:1

检查2个节点的/var/log/messages日志,发现2个节点均有多路径相关的错误信息,但相关的磁盘是用于备份的,不是用于生产数据库所使用的

Dec3005:25:31db1multipathd:backup2:sdcr-emc_clariion_checker:querycommandindicateserrorDec3005:25:31db1multipathd:checkerfailedpath69:240inmapbackup2Dec3005:25:31db1kernel:device-mapper:multipath:Failingpath69:240.Dec3005:25:31db1multipathd:backup2:remainingactivepaths:3Dec3005:25:37db1multipathd:backup2:sdcr-emc_clariion_checker:Activepathishealthy.Dec3005:25:37db1multipathd:69:240:reinstatedDec3005:25:37db1multipathd:backup2:remainingactivepaths:4Dec3005:25:37db1kernel:sd5:0:3:2:emc:ALUAfailovermodedetectedDec3005:25:37db1kernel:sd5:0:3:2:emc:atSPAPort5(owned,defaultSPA)Dec3005:26:03db1kernel:qla2xxx[0000:05:00.1]-801c:5:Abortcommandissuednexus=5:3:4--12002.Dec3006:03:35db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3006:15:23db1multipathd:backup3:sdcq-emc_clariion_checker:ReaderrorforWWN600601608b203300d563752524c1e611.Sensedataare0x0/0x0/0x0.Dec3006:15:23db1kernel:qla2xxx[0000:05:00.1]-801c:5:Abortcommandissuednexus=5:3:1--12002.Dec3006:15:23db1kernel:device-mapper:multipath:Failingpath69:224.Dec3006:15:23db1multipathd:checkerfailedpath69:224inmapbackup3Dec3006:15:23db1multipathd:backup3:remainingactivepaths:3Dec3006:15:28db1multipathd:backup3:sdcq-emc_clariion_checker:Activepathishealthy.Dec3006:15:28db1multipathd:69:224:reinstatedDec3006:15:28db1multipathd:backup3:remainingactivepaths:4Dec3006:15:28db1kernel:sd5:0:3:1:emc:ALUAfailovermodedetectedDec3006:15:28db1kernel:sd5:0:3:1:emc:atSPAPort5(owned,defaultSPA)Dec3006:59:29db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3007:53:22db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3007:55:11db1multipathd:sdct:couldn'tgetasymmetricaccessstateDec3007:55:11db1multipathd:backup4:loadtable[02147483648multipath2queue_if_no_pathretain_attached_hw_handler1emc21round-robin02170:16166:2401round-robin0218:64167:801]Dec3007:55:11db1kernel:sd5:0:3:4:emc:ALUAfailovermodedetectedDec3007:55:11db1kernel:sd5:0:3:4:emc:atSPAPort5(owned,defaultSPA)Dec3007:55:11db1kernel:sd4:0:3:4:emc:ALUAfailovermodedetectedDec3007:55:11db1kernel:sd4:0:3:4:emc:atSPAPort4(owned,defaultSPA)Dec3007:55:35db1multipathd:backup2:sdcr-emc_clariion_checker:ReaderrorforWWN600601608b203300d663752524c1e611.Sensedataare0x0/0x0/0x0.Dec3007:55:35db1multipathd:checkerfailedpath69:240inmapbackup2Dec3007:55:35db1multipathd:backup2:remainingactivepaths:3Dec3007:55:35db1kernel:device-mapper:multipath:Failingpath69:240.Dec3007:55:40db1multipathd:backup2:sdcr-emc_clariion_checker:Activepathishealthy.Dec3007:55:40db1multipathd:69:240:reinstatedDec3007:55:40db1multipathd:backup2:remainingactivepaths:4Dec3007:55:40db1kernel:sd5:0:3:2:emc:ALUAfailovermodedetectedDec3007:55:40db1kernel:sd5:0:3:2:emc:atSPAPort5(owned,defaultSPA)Dec3008:39:47db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3008:43:36db1multipathd:mpathb:loadtable[020971520multipath2queue_if_no_pathretain_attached_hw_handler1emc21round-robin02169:208166:1761round-robin0218:0167:161]Dec3008:43:36db1kernel:sd5:0:3:0:emc:ALUAfailovermodedetectedDec3008:43:36db1kernel:sd5:0:3:0:emc:atSPAPort5(owned,defaultSPA)Dec3008:43:36db1kernel:sd4:0:3:0:emc:ALUAfailovermodedetectedDec3008:43:36db1kernel:sd4:0:3:0:emc:atSPAPort4(owned,defaultSPA)Dec3009:24:04db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3010:13:09db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3011:06:07db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3012:07:36db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3013:08:58db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3014:00:19db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3014:52:20db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3015:40:45db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3016:34:38db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3017:09:56db1auditd[15975]:AuditdaemonrotatinglogfilesDec3017:38:16db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3018:59:38db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.Dec3019:54:43db1CLSD:Theclockonhostdb1hasbeenupdatedbytheClusterTimeSynchronizationServicetobesynchronouswiththemeanclustertime.

经查看集群相关日志可以确定,由于存储磁盘出现IO问题(或光线闪断、或IO延迟),导致集群CRS异常宕机。但是,比较奇怪的是,虽然CSR掉线了,ASM实例和DB实例却好好的,还可以正常使用。查询oracle support发现一篇文章1581864.1?提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系,而ASM的隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关,具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超时,导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是:首先,确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait,将_asm_hbeatiowait的值调整到比polling_interval值大即可。
下面是具体的解决操作:
1、查看数据库RAC ASM的_asm_hbeatiowait值(默认是15秒):

SQL>colksppinmfora20SQL>colksppstvlfora40SQL>colksppdescfora80SQL>SELECTksppinm,ksppstvl,ksppdesc2FROMx$ksppix,x$ksppcvy3WHEREx.indx=y.indxANDksppinm='_asm_hbeatiowait';KSPPINMKSPPSTVLKSPPDESC--------------------------------------------------------------------------------------------------------------------------------------------_asm_hbeatiowait15numberofsecstowaitforPSTAsyncHbeatIOreturn

2、查看操作存储盘访问超时时间(Oracle Linux 6.7默认是30秒)

[root@db1~]#cat/sys/block/sdb/device/timeout30[root@db2~]#cat/sys/block/sdb/device/timeout30

3、将_asm_hbeatiowait 的值调整为45秒(该参数是静态参数,需要重启集群)

SQL>altersystemset"_asm_hbeatiowait"=45scope=spfilesid='*';Systemaltered.

4.重启集群

5.检查集群状态

[grid@db1~]$crsctlstatres-t--------------------------------------------------------------------------------NAMETARGETSTATESERVERSTATE_DETAILS--------------------------------------------------------------------------------LocalResources--------------------------------------------------------------------------------ora.ARCH.dgONLINEONLINEdb1ONLINEONLINEdb2ora.CWDATA.dgONLINEONLINEdb1ONLINEONLINEdb2ora.DADATA.dgONLINEONLINEdb1ONLINEONLINEdb2ora.DATA.dgONLINEONLINEdb1ONLINEONLINEdb2ora.LISTENER.lsnrONLINEONLINEdb1ONLINEONLINEdb2ora.OCR.dgONLINEONLINEdb1ONLINEONLINEdb2ora.SBKDATA.dgONLINEONLINEdb1ONLINEONLINEdb2ora.asmONLINEONLINEdb1StartedONLINEONLINEdb2Startedora.gsdOFFLINEOFFLINEdb1OFFLINEOFFLINEdb2ora.net1.networkONLINEONLINEdb1ONLINEONLINEdb2ora.onsONLINEONLINEdb1ONLINEONLINEdb2--------------------------------------------------------------------------------ClusterResources--------------------------------------------------------------------------------ora.LISTENER_SCAN1.lsnr1ONLINEONLINEdb2ora.LISTENER_SCAN2.lsnr1ONLINEONLINEdb1ora.LISTENER_SCAN3.lsnr1ONLINEONLINEdb1ora.caiwu.db1ONLINEONLINEdb1Open2ONLINEONLINEdb2Openora.chdyl.db1ONLINEONLINEdb1Open2ONLINEONLINEdb2Openora.cvu1ONLINEONLINEdb1ora.dadb.db1ONLINEONLINEdb1Open2ONLINEONLINEdb2Openora.db1.vip1ONLINEONLINEdb1ora.db2.vip1ONLINEONLINEdb2ora.oc4j1ONLINEONLINEdb1ora.rlzy.db1ONLINEONLINEdb1Open2ONLINEONLINEdb2Openora.scan1.vip1ONLINEONLINEdb2ora.scan2.vip1ONLINEONLINEdb1ora.scan3.vip1ONLINEONLINEdb1

处理完成。

感谢各位的阅读,以上就是“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理方法是什么”的内容了,经过本文的学习后,相信大家对Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理方法是什么这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!