通过案例学Oracle之--一次AIX rac误操作引起的“血案”

系统环境:

操作系统:AIX 5300-09

集群软件: CRS 10.2.0.1

数据库: Oracle 10.2.0.1


本案例是用于基于VG Concurrent 的共享存储,通过HACMP 实现卷组的并发

案例分析:

一、错误现象:

1、Oracle 用户无法访问设备文件

2、CRS server启动失败


[oracle@aix211 ~]$ls -l /dev

/dev/__vg10:Nopermission/dev/audit:Nopermission/dev/cd0:Nopermission/dev/clone:Nopermission/dev/console:Nopermission/dev/error:Nopermission

查看设备文件属性,发现被改为oracle:dba

[oracle@aix211 ~]$ls -ld /dev

drw-rw---- 6 oracle dba 3584 Sep 16 11:38 /dev


重新更改设备文件属性

[root@aix211/]#chownroot.system/dev[root@aix211/]#ls-ld/devdrw-rw----6rootsystem3584Sep1611:38/dev[root@aix211/]#chmod775/dev

Oracle用户可以正常访问设备文件

[root@aix211/]#su-oracle[oracle@aix211~]$ls-l/devtotal24crw-rw----1rootsystem10,0Aug292013IPL_rootvgsrwxrwxrwx1rootsystem0Sep1610:22SRCbrw-rw----1oracledba88,9Sep1112:15control1_1brw-rw----1oracledba88,10Sep1112:15control2_2brw-rw----1oracledba88,11Sep1112:16control3_3crw-rw----1rootsystem88,0Sep1112:08datavg

但是CRS server仍然不能正常启动!

二、重新配置CRS:

1、清理ocr和vote disk磁盘信息(两个节点)

[root@aix211/]#ddif=/dev/zeroof=/dev/rrac_ocrbs=8192count=25602560+0recordsin2560+0recordsout[root@aix211/]#ddif=/dev/zeroof=/dev/rrac_votebs=8192count=25602560+0recordsin2560+0recordsout[root@aix211/]#ls-l/dev|grepocrbrw-rw----1oracledba88,1Sep1112:15rac_ocrcrw-r-----1rootoinstall88,1Sep1611:05rrac_ocr[root@aix211/]#chownoracle:dba/dev/rrac_ocr

2、重新运行root.sh脚本,配置CRS(两个节点)

node1:

[root@aix211 install]#./rootdelete.sh

ShuttingdownOracleClusterReadyServices(CRS):Sep1611:48:57.011|ERR|failedtoconnecttodaemon,errno(2)Stoppingresources.Errorwhilestoppingresources.Possiblecause:CRSDisdown.StoppingCSSD.UnabletocommunicatewiththeCSSdaemon.Shutdownhasbegun.Thedaemonsshouldexitsoon.CheckingtoseeifOracleCRSstackisdown...OracleCRSstackisnotrunning.OracleCRSstackisdownnow.RemovingscriptforOracleClusterReadyservicesUpdatingocrfilefordowngradeCleaningupSCRsettingsin'/etc/oracle/scls_scr'

[root@aix211 install]#/u01/crs_1/root.sh

WARNING:directory'/u01'isnotownedbyrootCheckingtoseeifOracleCRSstackisalreadyconfiguredCheckingtoseeifany9iGSDisupSettingthepermissionsonOCRbackupdirectorySettingupNSdirectoriesOracleClusterRegistryconfigurationupgradedsuccessfullyWARNING:directory'/u01'isnotownedbyrootclscfg:EXISTINGconfigurationversion3detected.clscfg:version3is10GRelease2.SuccessfullyaccumulatednecessaryOCRkeys.Usingports:CSS=49895CRS=49896EVMC=49898andEVMR=49897.node<nodenumber>:<nodename><privateinterconnectname><hostname>node1:aix211aix211-privaix211node2:aix212aix212-privaix212clscfg:Argumentscheckoutsuccessfully.NOKEYSWEREWRITTEN.Supply-forceparametertooverride.-forceisdestructiveandwilldestroyanypreviousclusterconfiguration.OracleClusterRegistryforclusterhasalreadybeeninitializedStartupwillbequeuedtoinitwithin30seconds.AddingdaemonstoinittabAddingdaemonstoinittabExpectingtheCRSdaemonstobeupwithin600seconds.CSSisactiveonthesenodes.aix211CSSisinactiveonthesenodes.aix212Localnodecheckingcomplete.Runroot.shonremainingnodestostartCRSdaemons.

node2:

[root@aix212 install]#./rootdelete.sh

ShuttingdownOracleClusterReadyServices(CRS):Sep1611:48:57.011|ERR|failedtoconnecttodaemon,errno(2)Stoppingresources.Errorwhilestoppingresources.Possiblecause:CRSDisdown.StoppingCSSD.UnabletocommunicatewiththeCSSdaemon.Shutdownhasbegun.Thedaemonsshouldexitsoon.CheckingtoseeifOracleCRSstackisdown...OracleCRSstackisnotrunning.OracleCRSstackisdownnow.RemovingscriptforOracleClusterReadyservicesUpdatingocrfilefordowngradeCleaningupSCRsettingsin'/etc/oracle/scls_scr'

[root@aix212@ /]#/u01/crs_1/root.sh

WARNING:directory'/u01'isnotownedbyrootCheckingtoseeifOracleCRSstackisalreadyconfiguredSettingthepermissionsonOCRbackupdirectorySettingupNSdirectoriesOracleClusterRegistryconfigurationupgradedsuccessfullyWARNING:directory'/u01'isnotownedbyrootclscfg:EXISTINGconfigurationversion3detected.clscfg:version3is10GRelease2.SuccessfullyaccumulatednecessaryOCRkeys.Usingports:CSS=49895CRS=49896EVMC=49898andEVMR=49897.node<nodenumber>:<nodename><privateinterconnectname><hostname>node1:aix211aix211-privaix211node2:aix212aix212-privaix212clscfg:Argumentscheckoutsuccessfully.NOKEYSWEREWRITTEN.Supply-forceparametertooverride.-forceisdestructiveandwilldestroyanypreviousclusterconfiguration.OracleClusterRegistryforclusterhasalreadybeeninitializedStartupwillbequeuedtoinitwithin30seconds.AddingdaemonstoinittabAddingdaemonstoinittabExpectingtheCRSdaemonstobeupwithin600seconds.CSSisactiveonthesenodes.aix211aix212CSSisactiveonallnodes.WaitingfortheOracleCRSDandEVMDtostartOracleCRSstackinstalledandrunningunderinit(1M)Runningvipca(silent)forconfiguringnodeappsThegiveninterface(s),"en0"isnotpublic.PublicinterfacesshouldbeusedtoconfigurevirtualIPs.

在node2上运行vipca,配置vip

@至此,CRS重新配置成功!

[root@aix212@/]#crsctlcheckcrsCSSappearshealthyCRSappearshealthyEVMappearshealthy[root@aix212@/]#crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212

三、重新注册Listener和Database

1、注册listener

通过netca工具,重新reconfigure就可以完成listener的注册!

[root@aix212@/]#crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212

2、注册Database和Instance

注册Database:

[root@aix212@ /]#srvctl add database -h

Usage:srvctladddatabase-d<name>-o<oracle_home>[-m<domain_name>][-p<spfile>][-A<name|ip>/netmask][-r{PRIMARY|PHYSICAL_STANDBY|LOGICAL_STANDBY}][-s<start_options>][-n<db_name>][-y{AUTOMATIC|MANUAL}]-d<name>Uniquenameforthedatabase-o<oracle_home>ORACLE_HOMEforclusterdatabase-m<domain>Domainforclusterdatabase-p<spfile>Serverparameterfileforclusterdatabase-A<addr_str>Databaseclusteralias-n<db_name>Databasename(DB_NAME),ifdifferentfromtheuniquenamegivenbythe-doption-r<role>Roleofthedatabase(primary,physical_standby,logical_standby)-s<start_options>Startupoptionsforthedatabase-y<dbpolicy>Managementpolicyforthedatabase(automatic,manual)-hPrintusage

[root@aix212@ /]#su - oracle

[oracle@aix212@ ~]$srvctl add database -d prod -o $ORACLE_HOME

[oracle@aix212@~]$crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212ora.prod.dbapplicationOFFLINEOFFLINE

注册Instance:

[oracle@aix212@~]$srvctladdinstance-hUsage:srvctladdinstance-d<name>-i<inst_name>-n<node_name>-d<name>Uniquenameforthedatabase-i<inst>Instancename-n<node>Nodename-hPrintusage[oracle@aix212@~]$srvctladdinstance-dprod-iprod1-naix211[oracle@aix212@~]$srvctladdinstance-dprod-iprod2-naix212[oracle@aix212@~]$crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212ora.prod.dbapplicationOFFLINEOFFLINEora....d1.instapplicationOFFLINEOFFLINEora....d2.instapplicationOFFLINEOFFLINE

通过crs工具启动Database:

[oracle@aix212@ ~]$srvctl start database -d prod

PRKP-1001 : Error starting instance prod1 on node aix211

CRS-0184: Cannot communicate with the CRS daemon.

PRKP-1001 : Error starting instance prod2 on node aix212

CRS-0184: Cannot communicate with the CRS daemon.

启动Instance失败,通过sqlplus手工启动:

[oracle@aix212@~]$sqlplus'/assysdba'SQL*Plus:Release10.2.0.1.0-ProductiononTueSep1612:08:102014Copyright(c)1982,2005,Oracle.Allrightsreserved.Connectedtoanidleinstance.SQL>startupORACLEinstancestarted.TotalSystemGlobalArea1258291200bytesFixedSize2020552bytesVariableSize352324408bytesDatabaseBuffers889192448bytesRedoBuffers14753792bytesDatabasemounted.Databaseopened.

[oracle@aix211aix211]$sqlplus'/assysdba'SQL*Plus:Release10.2.0.1.0-ProductiononTueSep1612:09:372014Copyright(c)1982,2005,Oracle.Allrightsreserved.Connectedtoanidleinstance.SQL>startupORACLEinstancestarted.TotalSystemGlobalArea1258291200bytesFixedSize2020552bytesVariableSize335547192bytesDatabaseBuffers905969664bytesRedoBuffers14753792bytesDatabasemounted.Databaseopened.

查看crs启动resource信息:

[oracle@aix211aix211]$crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212ora.prod.dbapplicationONLINEONLINEaix211ora....d1.instapplicationONLINEONLINEaix211ora....d2.instapplicationONLINEONLINEaix212

再通过crs工具重新启动Instance:

[oracle@aix211 aix211]$srvctl stop instance -d prod -i prod1

[oracle@aix211aix211]$crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212ora.prod.dbapplicationONLINEONLINEaix211ora....d1.instapplicationOFFLINEOFFLINEora....d2.instapplicationONLINEONLINEaix212

[oracle@aix211 aix211]$srvctl start instance -d prod -i prod1

[oracle@aix211aix211]$crs_stat-tNameTypeTargetStateHost------------------------------------------------------------ora....11.lsnrapplicationONLINEONLINEaix211ora.aix211.gsdapplicationONLINEONLINEaix211ora.aix211.onsapplicationONLINEONLINEaix211ora.aix211.vipapplicationONLINEONLINEaix211ora....12.lsnrapplicationONLINEONLINEaix212ora.aix212.gsdapplicationONLINEONLINEaix212ora.aix212.onsapplicationONLINEONLINEaix212ora.aix212.vipapplicationONLINEONLINEaix212ora.prod.dbapplicationONLINEONLINEaix211ora....d1.instapplicationONLINEONLINEaix211ora....d2.instapplicationONLINEONLINEaix212

@至此,通过crs工具可以正常启动和关闭Database,由于误操作而引起的血案,抢救成功!