同事对rac集群生成性能报告时发现rac集群有一个实例没有生成awr快照,另一个实例快照正常。下面是具体处理步骤。
1号实例没有生成awr快照

SQL>selectSNAP_ID,END_INTERVAL_TIME,instance_numberfromdba_hist_snapshotwhereinstance_number=1;norowsselected

2号实快照正常

SQL>setlong200SQL>setlinesize200SQL>select*from(selectSNAP_ID,END_INTERVAL_TIME,instance_numberfromdba_hist_snapshotwhereinstance_number=2orderbySNAP_IDdesc)whererownum<=10;SNAP_IDEND_INTERVAL_TIMEINSTANCE_NUMBER----------------------------------------------------------------------------------------------------2440517-AUG-1907.00.47.595PM22440417-AUG-1906.00.42.150PM22440317-AUG-1905.00.37.041PM22440217-AUG-1904.00.31.774PM22440117-AUG-1903.00.26.414PM22440017-AUG-1902.00.21.176PM22439917-AUG-1901.00.16.316PM22439817-AUG-1912.00.10.997PM22439717-AUG-1911.00.05.446AM22439617-AUG-1910.00.59.801AM210rowsselected.

mmon进程与awr快照相关,mmnl与ash相关,如是查看两个实例的mmon与mmnl进程
2号实例

[root@db2~]#ps-ef|grepmmonroot128329127956018:11pts/200:00:00grepmmonoracle201527102018?17:17:11ora_mmon_RLZY2[root@db2~]#ps-ef|grepmmnlroot131772127956018:17pts/200:00:00grepmmnloracle201531102018?1-06:06:24ora_mmnl_RLZY2

1号实例

[root@db1~]#ps-ef|grepmmonroot239020238963018:52pts/200:00:00grepmmon[root@db1~]#ps-ef|grepmmnlroot239052238963018:52pts/200:00:00grepmmnl

可以看到1号实例没有mmon与mmnl进程了。

如是查看1号实例的mmon进程的跟踪文件

[root@db1trace]#ls-lrt*mmon*.trc-rw-r-----1oracleasmadmin1351052Jan192018RLZY1_mmon_20073.trc-rw-r-----1oracleasmadmin173031Jan222018RLZY1_mmon_49119.trc[root@db1trace]#moreRLZY1_mmon_49119.trcTracefile/u01/app/oracle/diag/rdbms/rlzy/RLZY1/trace/RLZY1_mmon_49119.trcOracleDatabase11gEnterpriseEditionRelease11.2.0.4.0-64bitProductionWiththePartitioning,RealApplicationClusters,AutomaticStorageManagement,OLAP,DataMiningandRealApplicationTestingoptionsORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1Systemname:LinuxNodename:db1Release:3.8.13-68.3.4.el6uek.x86_64Version:#2SMPTueJul1415:03:36PDT2015Machine:x86_64Instancename:RLZY1Redothreadmountedbythisinstance:1Oracleprocessnumber:36Unixprocesspid:49119,image:oracle@db1(MMON)***2018-01-1913:55:20.030***SESSIONID:(1369.1)2018-01-1913:55:20.030***CLIENTID:()2018-01-1913:55:20.030***SERVICENAME:()2018-01-1913:55:20.030***MODULENAME:()2018-01-1913:55:20.030***ACTIONNAME:()2018-01-1913:55:20.030minact-scnslave-status:grec-scn:0x0000.00000000gmin-scn:0x0000.00000000gcalc-scn:0x0000.00000000***2018-01-1914:00:20.643minact-scnmaster-status:grec-scn:0x0e0e.55ad96efgmin-scn:0x0e0e.55abf256gcalc-scn:0x0e0e.55ac2a0a..........KEBM:MMONactionpolicyviolation.'BlockCleanoutOptim,UndoSegmentScan'viol=1;err=12751minact-scnmaster-status:grec-scn:0x0e0e.5f0ebf8cgmin-scn:0x0e0e.5f0eac2egcalc-scn:0x0e0e.5f0ead91DDErulesonlyexecutionfor:ORA12751***2018-01-2207:06:04.060-----STARTEventDrivenActionsDump--------ENDEventDrivenActionsDump---------STARTDDEActionsDump-----ExecutingSYNCactionsExecutingASYNCactions-----STARTDDEAction:'ORA_12751_DUMP'(Sync)-----Runtimeexceeded300secondsTimelimitviolationdetectedat:ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kgeselv()+276<-ksesecl0()+162<-ksucin()+147<-kcbzwb()+2727<-kcbgtcr()+31325<-ktucloUsMinScn()+539<-ktucloUsegScan()+992<-ksb_run_managed_action()+384<-ksbcti()+2490<-ksbabs()+1735<-kebm_mmon_main()+209<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253CurrentWaitStack:0:waitingfor'gcbufferbusyacquire'file#=0x5,block#=0x278,class#=0x49wait_id=255378seq_num=59358snap_id=1waittimes:snap=5min5sec,exc=5min5sec,total=5min5secwaittimes:max=infinite,heur=5min5secwaitcounts:calls=358os=358in_wait=1iflags=0x15a2Thereisatleastonesessionblockingthissession.Dumping1directblocker(s):inst:1,sid:990,ser:1Dumpingfinalblocker:inst:1,sid:990,ser:1WaitState:fixed_waits=0flags=0x22boundary=(nil)/-1SessionWaitHistory:elapsedtimeof0.000061secsincecurrentwait0:waitedfor'gccrblock2-way'=0x5,=0x258,=0x47wait_id=255377seq_num=59357snap_id=1waittimes:snap=0.000478sec,exc=0.000478sec,total=0.000478secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000122secofelapsedtime1:waitedfor'gccrblock2-way'=0x5,=0x228,=0x45wait_id=255376seq_num=59356snap_id=1waittimes:snap=0.000741sec,exc=0.000741sec,total=0.000741secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000120secofelapsedtime2:waitedfor'gccrblock2-way'=0x5,=0x138,=0x43wait_id=255375seq_num=59355snap_id=1waittimes:snap=0.000528sec,exc=0.000528sec,total=0.000528secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000111secofelapsedtime3:waitedfor'gccrblock2-way'=0x5,=0xb8,=0x41wait_id=255374seq_num=59354snap_id=1waittimes:snap=0.000583sec,exc=0.000583sec,total=0.000583secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000139secofelapsedtime4:waitedfor'gccrblock2-way'=0x5,=0x110,=0x37wait_id=255373seq_num=59353snap_id=1waittimes:snap=0.000541sec,exc=0.000541sec,total=0.000541secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000110secofelapsedtime5:waitedfor'gccrblock2-way'=0x5,=0x100,=0x35wait_id=255372seq_num=59352snap_id=1waittimes:snap=0.000629sec,exc=0.000629sec,total=0.000629secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000158secofelapsedtime6:waitedfor'gccrblock2-way'=0x5,=0xf0,=0x33wait_id=255371seq_num=59351snap_id=1waittimes:snap=0.000617sec,exc=0.000617sec,total=0.000617secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000128secofelapsedtime7:waitedfor'gccrblock2-way'=0x5,=0xe0,=0x31wait_id=255370seq_num=59350snap_id=1waittimes:snap=0.000561sec,exc=0.000561sec,total=0.000561secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000124secofelapsedtime8:waitedfor'gccrblock2-way'=0x5,=0xd0,=0x2fwait_id=255369seq_num=59349snap_id=1waittimes:snap=0.000565sec,exc=0.000565sec,total=0.000565secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000128secofelapsedtime9:waitedfor'gccrblock2-way'=0x5,=0xc0,=0x2dwait_id=255368seq_num=59348snap_id=1waittimes:snap=0.000555sec,exc=0.000555sec,total=0.000555secwaittimes:max=infinitewaitcounts:calls=1os=1occurredafter0.000125secofelapsedtimeSampledSessionHistoryofsession1369serial1---------------------------------------------------Thesampledsessionhistoryisconstructedbysamplingthetargetsessionevery1second.Thesamplingprocesscapturesateachsampleifthesessionisinanon-idlewait,anidlewait,ornotinawait.Ifthesessionisinanon-idlewaitthenoneintervalisshownforallthesamplesthesessionwasinthesamenon-idlewait.Ifthesessionisinanidlewaitornotinawaitforconsecutivesamplesthenoneintervalisshownforalltheconsecutivesamples.ThoughwedisplaytheseconsecutivesamplesinasingleintervalthesessionmayNOTbecontinuouslyidleornotinawait(thesamplingprocessdoesnotknow).Thehistoryisdisplayedinreversechronologicalorder.sampleinterval:1sec,maxhistory120sec---------------------------------------------------[121samples,07:04:03-07:06:03]waitedfor'gcbufferbusyacquire',seq_num:59358p1:'file#'=0x5p2:'block#'=0x278p3:'class#'=0x49time_waited:>=120sec(stillinwait)---------------------------------------------------SampledSessionHistorySummary:longest_non_idle_wait:'gcbufferbusyacquire'[121samples,07:04:03-07:06:03]time_waited:>=120sec(stillinwait)--------------------------------------------------------ENDDDEAction:'ORA_12751_DUMP'(SUCCESS,1csec)----------ENDDDEActionsDump(total1csec)-----KEBM:MMONactionpolicyviolation.'BlockCleanoutOptim,UndoSegmentScan'viol=1;err=12751minact-scnmaster-status:grec-scn:0x0e0e.5f0ec4cegmin-scn:0x0e0e.5f0eac2egcalc-scn:0x0e0e.5f0ead91***2018-01-2207:11:11.071DDErulesonlyexecutionfor:ORA12751-----STARTEventDrivenActionsDump--------ENDEventDrivenActionsDump----

ORA12751的错误原因是陈旧的SYS对象统计数据会导致生成次优执行计划,从而使AWR自动刷新从操作的语句运行更长时间和超时。

解决方法就是收集新的SYS对象统计信息,为优化器提供更好的统计信息,并生成更高效的执行计划

SQL>EXECDBMS_STATS.GATHER_FIXED_OBJECTS_STATS;PL/SQLproceduresuccessfullycompleted.SQL>EXECDBMS_STATS.GATHER_SCHEMA_STATS('SYS');PL/SQLproceduresuccessfullycompleted.

下面就是重启mmon和mmnl进程

SQL>altersystemenablerestrictedsession;Systemaltered.SQL>altersystemdisablerestrictedsession;Systemaltered.

查看alert日志可以看到mmon和mmnl进程已经重启了

SatAug1719:18:222019StartingbackgroundprocessMMONSatAug1719:18:222019StartingbackgroundprocessMMNLMMONstartedwithpid=399,OSid=10373SatAug1719:18:222019MMNLstartedwithpid=405,OSid=10375ALTERSYSTEMenablerestrictedsession;SatAug1719:18:252019SomeDDEasyncactionsfailedorwerecancelledSatAug1719:18:252019Sweep[inc][48021]:completedSweep[inc][48011]:completedSweep[inc][48002]:completedSweep[inc][35010]:completedSweep[inc][34706]:completedSweep[inc][34242]:completedSweep[inc][33546]:completedSweep[inc][33394]:completedSweep[inc2][48021]:completedSweep[inc2][48011]:completedSweep[inc2][48002]:completedSweep[inc2][35010]:completedSweep[inc2][34706]:completedSweep[inc2][34242]:completedSweep[inc2][33546]:completedSweep[inc2][33394]:completedminact-scn:Inst1isaslaveinc#:30mmonproc-id:10373status:0x2minact-scnstatus:grec-scn:0x0e0e.61cb2e9cgmin-scn:0x0e0e.5f0eac2egcalc-scn:0x0e0e.5f0ead91SatAug1719:18:292019db_recovery_file_dest_sizeof10240MBis20.87%used.Thisisauser-specifiedlimitontheamountofspacethatwillbeusedbythisdatabaseforrecovery-relatedfiles,anddoesnotreflecttheamountofspaceavailableintheunderlyingfilesystemorASMdiskgroup.SatAug1719:18:422019ALTERSYSTEMdisablerestrictedsession;

再查看1号实例的mmon与mmnl进程状态

[root@db1~]#ps-ef|grepmmnloracle103751019:18?00:00:00ora_mmnl_RLZY1root10611238963019:18pts/200:00:00grepmmnl[root@db1~]#ps-ef|grepmmonoracle103731719:18?00:00:02ora_mmon_RLZY1root10630238963019:18pts/200:00:00grepmmon

过了两个小时去查看1号实例已经生成了两条快照信息

SQL>setlong200SQL>setlinesize200SQL>select*from(selectSNAP_ID,END_INTERVAL_TIME,instance_numberfromdba_hist_snapshotwhereinstance_number=1orderbySNAP_IDdesc)whererownum<=10;SNAP_IDEND_INTERVAL_TIMEINSTANCE_NUMBER----------------------------------------------------------------------------------------------------2440717-AUG-1909.00.58.595PM12440617-AUG-1908.00.40.244PM1

到此问题解决了。