怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm

2025-03-13 技术教程

这篇文章主要讲解了“怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm”吧！

一、The Deadlock Detection Algorithm

TheDeadlockDetectionAlgorithm--------------------------------死锁检测算法Sinceweallowusertransactionstorequestlocksinanyorder,deadlockispossible.Weuseadeadlockdetection/breakingalgorithmthatisfairlystandardinessence,buttherearemanyspecialconsiderationsneededtodealwithPostgres'generalizedlockingmodel.PG允许事务以乱序的方式申请锁,因此会出现死锁的可能.PG使用的检测算法相当标准,但为了适配PG的锁模型因此有不少特别的考虑.Akeydesignconsiderationisthatwewanttomakeroutineoperations(lockgrantandrelease)runquicklywhenthereisnodeadlock,andavoidtheoverheadofdeadlockhandlingasmuchaspossible.Wedothisusingan"optimisticwaiting"approach:ifaprocesscannotacquirethelockitwantsimmediately,itgoestosleepwithoutanydeadlockcheck.Butitalsosetsadelaytimer,withadelayofDeadlockTimeoutmilliseconds(typicallysettoonesecond).Ifthedelayexpiresbeforetheprocessisgrantedthelockitwants,itrunsthedeadlockdetection/breakingcode.Normallythiscodewilldeterminethatthereisnodeadlockcondition,andthentheprocesswillgobacktosleepandwaitquietlyuntilitisgrantedthelock.Butifadeadlockconditiondoesexist,itwillberesolved,usuallybyabortingthedetectingprocess'transaction.Inthisway,weavoiddeadlockhandlingoverheadwheneverthewaittimeforalockislessthanDeadlockTimeout,whilenotimposinganunreasonabledelayofdetectionwhenthereisanerror.一个设计上考虑的关键点是在没有死锁的情况下可以让锁授予和释放可以执行得更快.通过使用一种"乐观等待"机制来实现:如果进程不能马上获取请求的锁,则马上休眠而不执行任何死锁检测.但同时设置了延迟时钟,延迟时长为DeadlockTimeout毫秒(典型值是1s).如果在进程被授予锁前延迟过期,则执行死锁检测和中断代码.通常来说,这些代码会确定是否存在死锁条件,然后进程会重新休眠并等待直至可以获取锁.但如果死锁条件确实存在,那需解决此问题,通过来说会回滚执行检测的事务.通过这种方法,避免锁等待时间小于DeadlockTimeout时的死锁处理开销,而在出现错误时则不执行不合理的延迟检测.Lockacquisition(routinesLockAcquireandProcSleep)followstheserules:锁获取(LockAcquire和ProcSleep子过程)遵循以下原则:1.Alockrequestisgrantedimmediatelyifitdoesnotconflictwithanyexistingorwaitinglockrequest,oriftheprocessalreadyholdsaninstanceofthesamelocktype(eg,there'snopenaltytoacquireareadlocktwice).Notethataprocessneverconflictswithitself,egonecanobtainreadlockwhenonealreadyholdsexclusivelock.1.对于如无冲突或者进程已持有相同类型的锁的锁请求(如获取两次读锁),则马上授予锁.注意进程永远不会与自己冲突,比如已获取独占锁的情况下可以获取读锁.2.Otherwisetheprocessjoinsthelock'swaitqueue.Normallyitwillbeaddedtotheendofthequeue,butthereisanexception:iftheprocessalreadyholdslocksonthissamelockableobjectthatconflictwiththerequestofanypendingwaiter,thentheprocesswillbeinsertedinthewaitqueuejustaheadofthefirstsuchwaiter.(Ifwedidnotmakethischeck,thedeadlockdetectioncodewouldadjustthequeueordertoresolvetheconflict,butit'srelativelycheaptomakethecheckinProcSleepandavoidadeadlocktimeoutdelayinthiscase.)Notespecialcasewheninsertingbeforetheendofthequeue:iftheprocess'srequestdoesnotconflictwithanyexistinglocknoranywaitingrequestbeforeitsinsertionpoint,thengoaheadandgrantthelockwithoutwaiting.2.否则,进程会加入到锁等待队列.通常来说,会加入到队列的末尾,但存在例外情况:如果进程已在对象上持有锁但与其他等待者的请求存在冲突,进程会插入到队列中这些waiter的前面.(如果不执行该检查,死锁检测代码会调整队列顺序以解决冲突,但在ProcSleep中执行该检查成本相对会低一点,同时在这种情况下可以避免一次死锁检测超时延迟)注意插入时在队列末尾的特殊情况:如果进程在插入点前的请求与现存锁或其他请求不存在冲突,则放在队列的最前面同时无需等待马上授予锁.Whenalockisreleased,thelockreleaseroutine(ProcLockWakeup)scansthelockobject'swaitqueue.Eachwaiterisawokenif(a)itsrequestdoesnotconflictwithalready-grantedlocks,and(b)itsrequestdoesnotconflictwiththerequestsofpriorun-wakablewaiters.Rule(b)ensuresthatconflictingrequestsaregrantedinorderofarrival.Therearecaseswherealaterwaitermustbeallowedtogoinfrontofconflictingearlierwaiterstoavoiddeadlock,butitisnotProcLockWakeup'sresponsibilitytorecognizethesecases;instead,thedeadlockdetectioncodewillre-orderthewaitqueuewhennecessary.锁释放时,ProcLockWakeup会扫描锁定对象的等待队列.唤醒满足(a)锁请求与现存锁不存在冲突(b)请求与先前未唤醒的waiter不存在冲突这两个条件的waiter.规则(b)确保存在冲突的请求必须按到达的先后顺序授予.避免在允许后来者可在出现冲突的waiter前可能出现的死锁,但这不是ProcLockWakeup的职责,相反,死锁检测代码会在需要时重组等待队列.Toperformdeadlockchecking,weusethestandardmethodofviewingthevariousprocessesasnodesinadirectedgraph(thewaits-forgraphorWFG).ThereisagraphedgeleadingfromprocessAtoprocessBifAwaitsforB,ie,AiswaitingforsomelockandBholdsaconflictinglock.ThereisadeadlockconditionifandonlyiftheWFGcontainsacycle.Wedetectcyclesbysearchingoutwardalongwaits-foredgestoseeifwereturntoourstartingpoint.Therearethreepossibleoutcomes:为了执行死锁检测,使用标准方法将各个进程视为有向图(WFG)中的节点.图中,如果A进程等待B,那么会有存在一条从A指向B的边.当且仅当如出现循环时,则会出现死锁.沿着等待的边进行检索看看是否会回到出发点来判断是否出现循环,这里有3种可能的情况:1.Alloutgoingpathsterminateatarunningprocess(whichhasnooutgoingedge).1.所有出发的路径都终止于一个正在运行的进程(而没有从该进程出发的边).2.Adeadlockisdetectedbyloopingbacktothestartpoint.Weresolvesuchadeadlockbycancelingthestartpoint'slockrequestandreportinganerrorinthattransaction,whichnormallyleadstotransactionabortandreleaseofthattransaction'sheldlocks.Notethatit'ssufficienttocancelonerequesttoremovethecycle;wedon'tneedtokillallthetransactionsinvolved.2.通过判断是否回到开始点来进行死锁检测.通过取消开始点的锁请求并在该事务反馈一个错误来解决死锁,该事务会取消并释放持有的锁.注意:取消一个锁就可以消除循环了,而不需要杀掉所有相关的事务.3.Somepath(s)loopbacktoanodeotherthanthestartpoint.Thisindicatesadeadlock,butonethatdoesnotinvolveourstartingprocess.Weignorethisconditiononthegroundsthatresolvingsuchadeadlockistheresponsibilityoftheprocessesinvolved---killingourstart-pointprocesswouldnotresolvethedeadlock.So,cases1and3bothreport"nodeadlock".3.某些路径回到某些节点而不是开始点.这意味着死锁,但不涉及到开始进程.基于由相关进程来解决死锁这一原则,PG会忽略此条件(杀掉开始点进程无助于解决死锁).因此,第1种和第3种情况会反馈"nodeadlock".Postgres'situationisalittlemorecomplexthanthestandarddiscussionofdeadlockdetection,fortworeasons:PG的情况比起标准的死锁检查有一点点的复杂,有2点原因:1.Aprocesscanbewaitingformorethanoneotherprocess,sincetheremightbemultiplePROCLOCKsof(non-conflicting)locktypesthatallconflictwiththewaiter'srequest.Thiscreatesnorealdifficultyhowever;wesimplyneedtobepreparedtotracemorethanoneoutgoingedge.1.进程可等待超过1个其他进程,因为存在多个与等待请求相冲突的锁类型相应的PROCLOCKs.这不会造成实际的困难,仅需要跟踪多个出发的边即可.2.IfaprocessAisbehindaprocessBinsomelock'swaitqueue,andtheirrequestedlocksconflict,thenwemustsaythatAwaitsforB,sinceProcLockWakeupwillneverawakenAbeforeB.ThiscreatesadditionaledgesintheWFG.Wecallthese"soft"edges,asopposedtothe"hard"edgesinducedbylocksalreadyheld.NotethatifBalreadyholdsanylocksconflictingwithA'srequest,thentheirrelationshipisahardedgenotasoftedge.2.如果进程A在等待队列中在B进程之后,而它们的请求锁冲突,这时候我们会认为A等待B,因为ProcLockWakeup在B之前不会唤醒A.这会在WFG中产生额外的我们称之为soft(相对于实际已持有的锁而言)的边.注意如果B已持有所有与A请求冲突的锁,那么它们的关系是硬边而不是软边.A"soft"block,orwait-priorityblock,hasthesamepotentialforinducingdeadlockasahardblock.However,wemaybeabletoresolveasoftblockwithoutabortingthetransactionsinvolved:wecaninsteadrearrangetheorderofthewaitqueue.Thisrearrangementreversesthedirectionofthesoftedgebetweentwoprocesseswithconflictingrequestswhosequeueorderisreversed.Ifwecanfindarearrangementthateliminatesacyclewithoutcreatingnewones,thenwecanavoidanabort.Checkingforsuchpossiblerearrangementsisthetrickiestpartofthealgorithm."soft"阻塞或者等待优先级阻塞,与硬阻塞的处理一样.但是,我们可以不需要取消事务而解决软阻塞:重新排列等待队列的顺序即可.重排会调转相关进程的优先顺序.如果能够重排而解决循环,那么可以避免取消事务.检查是否存在这样的重排是算法中最棘手的部分.TheworkhorseofthedeadlockdetectorisaroutineFindLockCycle()whichisgivenastartingpointprocess(whichmustbeawaitingprocess).Itrecursivelyscansoutwardacrosswaits-foredgesasdiscussedabove.Ifitfindsnocycleinvolvingthestartpoint,itreturns"false".(Asdiscussedabove,wecanignorecyclesnotinvolvingthestartpoint.)Whensuchacycleisfound,FindLockCycle()returns"true",andasitunwindsitalsobuildsalistofany"soft"edgesinvolvedinthecycle.Iftheresultinglistisemptythenthereisaharddeadlockandtheconfigurationcannotsucceed.However,ifthelistisnotempty,thenreversinganyoneofthelistededgesthroughwait-queuerearrangementwilleliminatethatcycle.Sincesuchareversalmightcreatecycleselsewhere,wemayneedtotryeverypossibility.Therefore,weneedtobeabletoinvokeFindLockCycle()onhypotheticalconfigurations(waitorders)aswellasthecurrentrealorder.死锁检测器主要有例程FindLockCycle实现,该例程输入为起始点过程(正在等待的进程).如上所述,递归扫描指向到其他进程的边.如果从开始点没有发现循环,返回"false".(正如上述所讨论的,忽略与开始点无关的循环).如果发现了存在循环,FindLockCycle例程会返回"true",在展开(unwinds)时,它还构建了一个包含涉及所有软边的链表.如果结果链表为空,那么只存在硬死锁,重排不会成功.但是,如果链表非空,递归重排链表中的边检查是否可以消除循环.由于这样的重排可能会在其他地方产生新的循环,因此需要尝试各种可能.因此,我们需要具备在假设配置和实际顺序调用FindLockCycle的能力.Theeasiestwaytohandlethisseemstobetohavealookasidetablethatshowstheproposednewqueueorderforeachwaitqueuethatweareconsideringrearranging.ThistableischeckedbyFindLockCycle,anditbelievestheproposedqueueorderratherthantherealorderforeachlockthathasanentryinthelookasidetable.看起来最简单的方法是使用一个后备表,该表显示了我们正在考虑队列重排的每个建议的新顺序.该表通过FindLockCycle检测,并且该例程相信建议的队列顺序而不是实际顺序.Webuildaproposednewqueueorderbydoinga"topologicalsort"oftheexistingentries.Eachsoftedgethatwearecurrentlyconsideringreversingcreatesapropertyofthepartialorderthatthetopologicalsorthastoenforce.Wemustuseasortmethodthatpreservestheinputorderingasmuchaspossible,soasnottogratuitouslybreakarrivalorderforprocessesnotinvolvedinadeadlock.(ThisisnottrueofthetsortmethodshowninKnuth,forexample,butit'seasilydonebyasimpledoubly-nested-loopmethodthatemitsthefirstlegalcandidateateachstep.Fortunately,wedon'tneedahighlyefficientsortalgorithm,sincethenumberofpartialorderconstraintsisnotlikelytobelarge.)Notethatfailureofthetopologicalsorttellsuswehaveconflictingorderingconstraints,andthereforethatthelast-addedsoftedgereversalconflictswithaprioredgereversal.Weneedtodetectthiscasetoavoidaninfiniteloopinthecasewherenopossiblerearrangementwillwork:otherwise,wemighttryareversal,findthatitstillleadstoacycle,thentrytoun-reversethereversalwhiletryingtogetridofthatcycle,etcetc.Topologicalsortfailuretellsustheun-reversalisnotalegitimatemoveinthiscontext.对每一存在的条目使用"topologicalsort"创建可能的新队列顺序.我们目前考虑反转的每个软边都创建了拓扑排序必须强制执行的偏序属性.我们必须使用一种尽可能保留输入顺序的排序方法,这样就不会无缘无故破坏不涉及死锁进程的到达顺序.注意拓扑排序如果失败则意味着存在冲突排序约束,因此最好添加的软边反转与先前的反转存在冲突.需要检测这种情况以避免出现死循环.可以试着反转,如果发现它仍然会导致循环,那么再反转,试图摆脱那个循环,等等.拓扑排序失败意味着在这种情况下反向操作是不合法的.So,thebasicstepinourrearrangementmethodistotakealistofsoftedgesinacycle(asreturnedbyFindLockCycle())andsuccessivelytrythereversalofeachoneasatopological-sortconstraintaddedtowhateverconstraintswearealreadyconsidering.Werecursivelysearchthroughallsuchsetsofconstraintstoseeifanyoneeliminatesallthedeadlockcyclesatonce.Althoughthismightseemimpossiblyinefficient,itshouldn'tbeabigprobleminpractice,becausetherewillnormallybeveryfew,andnotverylarge,deadlockcycles---ifanyatall.Sothecombinatorialinefficiencyisn'tgoingtohurtus.Besides,it'sbettertospendsometimetoguaranteethatwe'vecheckedallpossibleescaperoutesthantoabortatransactionwhenwedidn'treallyhaveto.因此,重排方法的基础步骤是获取循环中的软边链表(FindLockCycle返回),依次尝试将每个约束的反转作为拓扑排序约束添加到已经考虑的其他约束中.递归检索所有这样的约束集合来看看是否存在可以消除循环的可能.虽然这看来不太可能很有效,但在实践中也没有什么问题,因为死锁循环的数量通常很小.因此这样的组合并不会有什么问题.话又说回来,最好花点时间来保证已检测所有可能的路径而不是什么都不做就取消事务.EachedgereversalconstraintcanbeviewedasrequestingthatthewaitingprocessAbemovedtobeforetheblockingprocessBinthewaitqueuetheyarebothin.Thisactionwillreversethedesiredsoftedge,aswellasanyothersoftedgesbetweenAandotherprocessesitisadvancedover.Nootheredgeswillbeaffected(notethisisactuallyaconstraintonourtopologicalsortmethodtonotre-orderthequeuemorethannecessary.)Therefore,wecanbesurewehavenotcreatedanynewdeadlockcyclesifneitherFindLockCycle(A)norFindLockCycle(B)discoversanycycle.Giventheabove-definedbehaviorofFindLockCycle,eachofthesesearchesisnecessaryaswellassufficient,sinceFindLockCyclestartingattheoriginalstartpointwillnotcomplainaboutcyclesthatincludeAorBbutnottheoriginalstartpoint.每个边的反转约束都可以看做是请求等待进程A移到等待队列的阻塞进程B的前面.这样的做法会反转有向软边,现对于在A和其他进程之间的其他软边,它是advancedover的.没有其他边受影响,因此可以确保不会出现新的死锁循环.根据以上定义的FindLockCycle行为,这些搜索都是必要的,也是充分的,因为从起始点开始的FindLockCycle不会认为包含A或B但不包含初始起始点的循环有问题.Inshortthen,aproposedrearrangementofthewaitqueue(s)isdeterminedbyoneormorebrokensoftedgesA->B,fullyspecifiedbytheoutputoftopologicalsortsofeachwaitqueueinvolved,andthentestedbyinvokingFindLockCycle()startingattheoriginalstartpointaswellaseachofthementionedprocesses(A'sandB's).Ifnoneofthetestsdetectacycle,thenwehaveavalidconfigurationandcanimplementitbyreorderingthewaitqueuesperthesortoutputs(andthenapplyingProcLockWakeuponeachreorderedqueue,incaseawaiterhasbecomewakable).Ifanytestdetectsasoftcycle,wecantrytoresolveitbyaddingeachsoftlinkinthatcycle,inturn,totheproposedrearrangementlist.Thisisrepeatedrecursivelyuntilweeitherfindaworkablerearrangementordeterminethatnoneexists.Inthelattercase,theouterlevelresolvesthedeadlockbyabortingtheoriginalstart-pointtransaction.简短的来说,等待队列的重排通过打破一个或多个A->B这样的软边来确定,由所涉及的每个等待队列的拓扑排序的输出完全指定,然后通过调用FindLockCycle进行测试,该函数从原始的起始点以及上面提到的每个进程(A&B)开始.如果没有一个测试检测到循环,那么我们有了一个有效的配置,通过重排每个重新排序的队列来实现这一点.如果测试发现了软循环,可以尝试通过将该循环中的每个软链接依次添加到建议的重排链表中来解决.递归处理直至找到了可用的重排或者确定重排不存在.在后续的情况中,外层通过取消开始点事务来解决死锁.TheparticularorderinwhichrearrangementsaretrieddependsontheorderFindLockCycle()happenstoscanin,soiftherearemultipleworkablerearrangementsofthewaitqueues,thenitisunspecifiedwhichonewillbechosen.What'smoreimportantisthatweguaranteetotryeveryqueuerearrangementthatcouldleadtosuccess.(Forexample,ifwehaveAbeforeBbeforeCandtheneededorderconstraintsareCbeforeAandBbeforeC,wewouldfirstdiscoverthatAbeforeCdoesn'tworkandtrytherearrangementCbeforeAbeforeB.ThiswouldeventuallyleadtothediscoveryoftheadditionalconstraintBbeforeC.)尝试重排的特定顺序取决于FindLockCycle碰巧扫描进来的顺序,因此如果存在多个等待队列上的重排,那么选择哪一个就不确定.更重要的是我们确保尝试每个队列重排可能会成功.(比如,如果A在B和C的前面,B在C的前面,排序约束是C在A之前和B在C之前,首先会发现A在C之前是不行的,这时候会重排C在A和B之前.这会导致额外的约束B在C之前)

感谢各位的阅读，以上就是“怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm”的内容了，经过本文的学习后，相信大家对怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是亿速云，小编将为大家推送更多相关知识点的文章，欢迎关注！