PostgreSQL中BufferAlloc函数有什么作用

2025-03-12 技术教程

这篇文章主要介绍“PostgreSQL中BufferAlloc函数有什么作用”，在日常操作中，相信很多人在PostgreSQL中BufferAlloc函数有什么作用问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”PostgreSQL中BufferAlloc函数有什么作用”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

一、数据结构

BufferDesc
共享缓冲区的共享描述符(状态)数据

/**Flagsforbufferdescriptors*buffer描述器标记**Note:TAG_VALIDessentiallymeansthatthereisabufferhashtable*entryassociatedwiththebuffer'stag.*注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。*///bufferheader锁定#defineBM_LOCKED(1U<<22)/*bufferheaderislocked*///数据需要写入(标记为DIRTY)#defineBM_DIRTY(1U<<23)/*dataneedswriting*///数据是有效的#defineBM_VALID(1U<<24)/*dataisvalid*///已分配buffertag#defineBM_TAG_VALID(1U<<25)/*tagisassigned*///正在R/W#defineBM_IO_IN_PROGRESS(1U<<26)/*readorwriteinprogress*///上一个I/O出现错误#defineBM_IO_ERROR(1U<<27)/*previousI/Ofailed*///开始写则变DIRTY#defineBM_JUST_DIRTIED(1U<<28)/*dirtiedsincewritestarted*///存在等待solepin的其他进程#defineBM_PIN_COUNT_WAITER(1U<<29)/*havewaiterforsolepin*///checkpoint发生,必须刷到磁盘上#defineBM_CHECKPOINT_NEEDED(1U<<30)/*mustwriteforcheckpoint*///持久化buffer(不是unlogged或者初始化fork)#defineBM_PERMANENT(1U<<31)/*permanentbuffer(notunlogged,*orinitfork)*//**BufferDesc--shareddescriptor/statedataforasinglesharedbuffer.*BufferDesc--共享缓冲区的共享描述符(状态)数据**Note:Bufferheaderlock(BM_LOCKEDflag)mustbeheldtoexamineorchange*thetag,stateorwait_backend_pidfields.Ingeneral,bufferheaderlock*isaspinlockwhichiscombinedwithflags,refcountandusagecountinto*singleatomicvariable.Thislayoutallowustodosomeoperationsina*singleatomicoperation,withoutactuallyacquiringandreleasingspinlock;*forinstance,increaseordecreaserefcount.buf_idfieldneverchanges*afterinitialization,sodoesnotneedlocking.freeNextisprotectedby*thebuffer_strategy_locknotbufferheaderlock.TheLWLockcantakecare*ofitself.Thebufferheaderlockis*not*usedtocontrolaccesstothe*datainthebuffer!*注意:必须持有Bufferheader锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段.*通常来说,bufferheaderlock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中.*这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数).*buf_id字段在初始化后不会出现变化,因此不需要锁定.*freeNext通过buffer_strategy_lock锁而不是bufferheaderlock保护.*LWLock可以很好的处理自己的状态.*务请注意的是:bufferheaderlock不用于控制buffer中的数据访问!**It'sassumedthatnobodychangesthestatefieldwhilebufferheaderlock*isheld.Thusbufferheaderlockholdercandocomplexupdatesofthe*statevariableinsinglewrite,simultaneouslywithlockrelease(cleaning*BM_LOCKEDflag).Ontheotherhand,updatingofstatewithoutholding*bufferheaderlockisrestrictedtoCAS,whichinsurethatBM_LOCKEDflag*isnotset.Atomicincrement/decrement,OR/ANDetc.arenotallowed.*假定在持有bufferheaderlock的情况下,没有人改变状态字段.*持有bufferheaderlock的进程可以执行在单个写操作中执行复杂的状态变量更新,*同步的释放锁(清除BM_LOCKED标记).*换句话说,如果没有持有bufferheaderlock的状态更新,会受限于CAS,*这种情况下确保BM_LOCKED没有被设置.*比如原子的增加/减少(AND/OR)等操作是不允许的.**Anexceptionisthatifwehavethebufferpinned,itstagcan'tchange*underneathus,sowecanexaminethetagwithoutlockingthebufferheader.*Also,inplaceswedoone-timereadsoftheflagswithoutbotheringto*lockthebufferheader;thisisgenerallyforsituationswherewedon't*expecttheflagbitbeingtestedtobechanging.*一种例外情况是如果我们已有bufferpinned,该buffer的tag不能改变(在本进程之下),*因此不需要锁定bufferheader就可以检查tag了.*同时,在执行一次性的flags读取时不需要锁定bufferheader.*这种情况通常用于我们不希望正在测试的flagbit将被改变.**Wecan'tphysicallyremoveitemsfromadiskpageifanotherbackendhas*thebufferpinned.Hence,abackendmayneedtowaitforallotherpins*togoaway.ThisissignaledbystoringitsownPIDinto*wait_backend_pidandsettingflagbitBM_PIN_COUNT_WAITER.Atpresent,*therecanbeonlyonesuchwaiterperbuffer.*如果其他进程有bufferpinned,那么进程不能物理的从磁盘页面中删除items.*因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中,*并设置标记位BM_PIN_COUNT_WAITER.*目前,每个缓冲区只能由一个等待进程.**Weusethissamestructforlocalbufferheaders,butthelocksarenot*usedandnotalloftheflagbitsareusefuleither.Toavoidunnecessary*overhead,manipulationsofthestatefieldshouldbedonewithoutactual*atomicoperations(i.e.onlypg_atomic_read_u32()and*pg_atomic_unlocked_write_u32()).*本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用.*为了避免不必要的负载,状态域的维护不需要实际的原子操作*(比如只有pg_atomic_read_u32()andpg_atomic_unlocked_write_u32())**Becarefultoavoidincreasingthesizeofthestructwhenaddingor*reorderingmembers.Keepingitbelow64bytes(themostcommonCPU*cachelinesize)isfairlyimportantforperformance.*在增加或者记录成员变量时,小心避免增加结构体的大小.*保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的.*/typedefstructBufferDesc{//buffertagBufferTagtag;/*IDofpagecontainedinbuffer*///buffer索引编号(0开始)intbuf_id;/*buffer'sindexnumber(from0)*//*stateofthetag,containingflags,refcountandusagecount*///tag状态,包括flags/refcount和usagecountpg_atomic_uint32state;//pin-count等待进程IDintwait_backend_pid;/*backendPIDofpin-countwaiter*///空闲链表链中下一个空闲的bufferintfreeNext;/*linkinfreelistchain*///缓冲区内容锁LWLockcontent_lock;/*tolockaccesstobuffercontents*/}BufferDesc;

BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block

/**Buffertagidentifieswhichdiskblockthebuffercontains.*Buffertag标记了buffer存储的是磁盘中哪个block**Note:theBufferTagdatamustbesufficienttodeterminewheretowritethe*block,withoutreferencetopg_classorpg_tablespaceentries.It's*possiblethatthebackendflushingthebufferdoesn'tevenbelievethe*relationisvisibleyet(itsxactmayhavestartedbeforethexactthat*createdtherel).Thestoragemanagermustbeabletocopeanyway.*注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息.*有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前).*存储管理器必须可以处理这些事情.**Note:ifthere'sanypadbytesinthestruct,INIT_BUFFERTAGwillhave*tobefixedtozerothem,sincethisstructisusedasahashkey.*注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零，因为这个结构体用作散列键.*/typedefstructbuftag{//物理relation标识符RelFileNodernode;/*physicalrelationidentifier*/ForkNumberforkNum;//相对于relation起始的块号BlockNumberblockNum;/*blknumrelativetobeginofreln*/}BufferTag;

SMgrRelation
smgr.c维护一个包含SMgrRelation对象的hash表,SMgrRelation对象本质上是缓存的文件句柄.

/**smgr.cmaintainsatableofSMgrRelationobjects,whichareessentially*cachedfilehandles.AnSMgrRelationiscreated(ifnotalreadypresent)*bysmgropen(),anddestroyedbysmgrclose().Notethatneitherofthese*operationsimplyI/O,theyjustcreateordestroyahashtableentry.*(Butsmgrclose()mayreleaseassociatedresources,suchasOS-levelfile*descriptors.)*smgr.c维护一个包含SMgrRelation对象的hash表,SMgrRelation对象本质上是缓存的文件句柄.*SMgrRelation对象(如非现成)通过smgropen()方法创建,通过smgrclose()方法销毁.*注意:这些操作都不会执行I/O操作,只会创建或者销毁哈希表条目.*(但是smgrclose()方法可能会释放相关的资源,比如OS基本的文件描述符)**AnSMgrRelationmayhavean"owner",whichisjustapointertoitfrom*somewhereelse;smgr.cwillclearthispointeriftheSMgrRelationis*closed.Weusethistoavoiddanglingpointersfromrelcachetosmgr*withouthavingtomakethesmgrexplicitlyawareofrelcache.There*can'tbemorethanone"owner"pointerperSMgrRelation,butthat's*allweneed.*SMgrRelation可能会有"宿主",这个宿主可能只是从某个地方指向它的指针而已;*如SMgrRelationsmgr.c会清除该指针.这样做可以避免从relcache到smgr的悬空指针,*而不必要让smgr显式的感知relcache(也就是隔离了smgr了relcache).*每个SMgrRelation不能跟多个"owner"指针关联,但这就是我们所需要的.**SMgrRelationsthatdonothavean"owner"areconsideredtobetransient,*andaredeletedatendoftransaction.*SMgrRelations如无owner指针,则被视为临时对象,在事务的最后被删除.*/typedefstructSMgrRelationData{/*rnodeisthehashtablelookupkey,soitmustbefirst!*///--------rnode是哈希表的搜索键,因此在结构体的首位//关系物理定义IDRelFileNodeBackendsmgr_rnode;/*relationphysicalidentifier*//*pointertoowningpointer,orNULLifnone*///---------指向拥有的指针,如无则为NULLstructSMgrRelationData**smgr_owner;/**Thesenextthreefieldsarenotactuallyusedormanipulatedbysmgr,*exceptthattheyareresettoInvalidBlockNumberuponacacheflush*event(inparticular,upontruncationoftherelation).Higherlevels*storecachedstateheresothatitwillberesetwhentruncation*happens.Inallthreecases,InvalidBlockNumbermeans"unknown".*接下来的3个字段实际上并不用于或者由smgr管理,*除非这些表里在cacheflushevent发生时被重置为InvalidBlockNumber*(特别是在关系被截断时).*在这里,更高层的存储缓存了状态因此在截断发生时会被重置.*在这3种情况下,InvalidBlockNumber都意味着"unknown".*///当前插入的目标blocBlockNumbersmgr_targblock;/*currentinsertiontargetblock*///最后已知的fsmfork大小BlockNumbersmgr_fsm_nblocks;/*lastknownsizeoffsmfork*///最后已知的vmfork大小BlockNumbersmgr_vm_nblocks;/*lastknownsizeofvmfork*//*additionalpublicfieldsmaysomedayexisthere*///-------未来可能新增的公共域/**Fieldsbelowhereareintendedtobeprivatetosmgr.candits*submodules.Donottouchthemfromelsewhere.*下面的字段是smgr.c及其子模块私有的,不要从其他模块接触这些字段.*///存储管理器选择器intsmgr_which;/*storagemanagerselector*//**formd.c;per-forkarraysofthenumberofopensegments*(md_num_open_segs)andthesegmentsthemselves(md_seg_fds).*用于md.c,打开段(md_num_open_segs)和段自身(md_seg_fds)的数组(每个fork一个)*/intmd_num_open_segs[MAX_FORKNUM+1];struct_MdfdVec*md_seg_fds[MAX_FORKNUM+1];/*ifunowned,listlinkinlistofallunownedSMgrRelations*///如没有宿主,未宿主的SMgrRelations链表的链表链接.structSMgrRelationData*next_unowned_reln;}SMgrRelationData;typedefSMgrRelationData*SMgrRelation;

RelFileNodeBackend
组合relfilenode和后台进程ID,用于提供需要定位物理存储的所有信息.

/**AugmentingarelfilenodewiththebackendIDprovidesalltheinformation*weneedtolocatethephysicalstorage.ThebackendIDisInvalidBackendId*forregularrelations(thoseaccessibletomorethanonebackend),orthe*owningbackend'sIDforbackend-localrelations.Backend-localrelations*arealwaystransientandremovedincaseofadatabasecrash;theyare*neverWAL-loggedorfsync'd.*组合relfilenode和后台进程ID,用于提供需要定位物理存储的所有信息.*对于普通的关系(可通过多个后台进程访问),后台进程ID是InvalidBackendId;*如为临时表,则为自己的后台进程ID.*临时表(backend-localrelations)通常是临时存在的,在数据库崩溃时删除,无需WAL-logged或者fsync.*/typedefstructRelFileNodeBackend{RelFileNodenode;//节点BackendIdbackend;//后台进程}RelFileNodeBackend;二、源码解读

BufferAlloc是ReadBuffer的子过程.处理共享缓存的搜索.如果已无buffer可用,则选择一个可替换的buffer并删除旧页面,但注意不要读入新页面.
该函数的主要处理逻辑如下:
1.初始化,根据Tag确定hash值和分区锁定ID
2.检查block是否已在buffer pool中
3.在缓冲区中找到该buffer(buf_id >= 0)
3.1获取buffer描述符并Pin buffer
3.2如PinBuffer返回F,则执行StartBufferIO,如该函数返回F,则设置标记*foundPtr为F
3.3返回buf
4.在缓冲区中找不到该buffer(buf_id < 0)
4.1释放newPartitionLock
4.2执行循环,寻找合适的buffer
4.2.1确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目)
4.2.2选择一个待淘汰的buffer
4.2.3拷贝buffer flags到oldFlags中
4.2.4Pin buffer,然后释放buffer自旋锁
4.2.5如buffer标记位BM_DIRTY,FlushBuffer
4.2.6如buffer标记为BM_TAG_VALID,计算原tag的hashcode和partition lock ID,并锁定新旧分区锁
否则需要新的分区,锁定新分区锁,重置原分区锁和原hash值
4.2.7尝试使用buffer新的tag构造hash表入口
4.2.8存在冲突(buf_id >= 0),在这里只需要像一开始处理的那样,视为已在缓冲池发现该buffer
4.2.9不存在冲突(buf_id < 0),锁定buffer header,如缓冲区没有变脏或者被pinned,则已找到buf,跳出循环
否则,解锁buffer header,删除hash表入口,释放锁,重新寻找buffer
4.3可以重新设置buffer tag,完成后解锁buffer header,删除原有的hash表入口,释放分区锁
4.4执行StartBufferIO,设置*foundPtr标记
4.5返回buf

/**BufferAlloc--subroutineforReadBuffer.Handleslookupofashared*buffer.Ifnobufferexistsalready,selectsareplacement*victimandevictstheoldpage,butdoesNOTreadinnewpage.*BufferAlloc--ReadBuffer的子过程.处理共享缓存的搜索.*如果已无buffer可用,则选择一个可替换的buffer并删除旧页面,但注意不要读入新页面.**"strategy"canbeabufferreplacementstrategyobject,orNULLfor*thedefaultstrategy.Theselectedbuffer'susage_countisadvancedwhen*usingthedefaultstrategy,butotherwisepossiblynot(seePinBuffer).*"strategy"可以是缓存替换策略对象,如为默认策略,则为NULL.*如使用默认读取策略,则选中的缓冲buffer的usage_count会加一,但也可能不会增加(详细参见PinBuffer).**Thereturnedbufferispinnedandisalreadymarkedasholdingthe*desiredpage.Ifitalreadydidhavethedesiredpage,*foundPtris*settrue.Otherwise,*foundPtrissetfalseandthebufferismarked*asIO_IN_PROGRESS;ReadBufferwillnowneedtodoI/Otofillit.*返回的buffer已pinned并已标记为持有指定的页面.*如果确实已持有指定的页面,*foundPtr设置为T.*否则的话,*foundPtr设置为F,buffer标记为IO_IN_PROGRESS,ReadBuffer将会执行I/O操作.***foundPtrisactuallyredundantwiththebuffer'sBM_VALIDflag,but*wekeepitforsimplicityinReadBuffer.**foundPtr跟buffer的BM_VALID标记是重复的,但为了ReadBuffer中的简化,仍然保持这个参数.**Nolocksareheldeitheratentryorexit.*在进入或者退出的时候,不需要持有任何的Locks.*/staticBufferDesc*BufferAlloc(SMgrRelationsmgr,charrelpersistence,ForkNumberforkNum,BlockNumberblockNum,BufferAccessStrategystrategy,bool*foundPtr){//请求block的IDBufferTagnewTag;/*identityofrequestedblock*///newTag的Hash值uint32newHash;/*hashvaluefornewTag*///缓冲区分区锁LWLock*newPartitionLock;/*bufferpartitionlockforit*///选中缓冲区对应的上一个IDBufferTagoldTag;/*previousidentityofselectedbuffer*///oldTag的hash值uint32oldHash;/*hashvalueforoldTag*///原缓冲区分区锁LWLock*oldPartitionLock;/*bufferpartitionlockforit*///原标记位uint32oldFlags;//bufferID编号intbuf_id;//buffer描述符BufferDesc*buf;//是否有效boolvalid;//buffer状态uint32buf_state;/*createatagsowecanlookupthebuffer*///创建一个tag,用于检索bufferINIT_BUFFERTAG(newTag,smgr->smgr_rnode.node,forkNum,blockNum);/*determineitshashcodeandpartitionlockID*///根据Tag确定hash值和分区锁定IDnewHash=BufTableHashCode(&newTag);newPartitionLock=BufMappingPartitionLock(newHash);/*seeiftheblockisinthebufferpoolalready*///检查block是否已在bufferpool中LWLockAcquire(newPartitionLock,LW_SHARED);buf_id=BufTableLookup(&newTag,newHash);if(buf_id>=0){//----在缓冲区中找到该buffer/**Foundit.Now,pinthebuffersonoonecanstealitfromthe*bufferpool,andchecktoseeifthecorrectdatahasbeenloaded*intothebuffer.*找到了!现在pin缓冲区,确保没有进程可以从缓冲区中删除*检查正确的数据是否已装载到缓冲区中.*/buf=GetBufferDescriptor(buf_id);//Pin缓冲区valid=PinBuffer(buf,strategy);/*Canreleasethemappinglockassoonaswe'vepinnedit*///一旦pinned,立即释放newPartitionLockLWLockRelease(newPartitionLock);//设置返回参数*foundPtr=true;if(!valid){//如无效/**Wecanonlygethereif(a)someoneelseisstillreadingin*thepage,or(b)apreviousreadattemptfailed.Wehaveto*waitforanyactivereadattempttofinish,andthensetupour*ownreadattemptifthepageisstillnotBM_VALID.*StartBufferIOdoesitall.*程序执行到这里原因是(a)有其他进程仍然读入了该page,或者(b)上一次读取尝试失败.*在这里必须等到其他活动的读取完成,然后在page状态仍然不是BM_VALID时设置读取尝试.*StartBufferIO过程执行这些工作.*/if(StartBufferIO(buf,true)){/**Ifwegethere,previousattemptstoreadthebuffermust*havefailed...butweshallbravelytryagain.*///上一次尝试读取已然失败,这里还是需要勇敢的再试一次!*foundPtr=false;//设置为F}}//返回bufreturnbuf;}/**Didn'tfinditinthebufferpool.We'llhavetoinitializeanew*buffer.Remembertounlockthemappinglockwhiledoingthework.*没有在缓冲池中发现该buffer.*这时候不得不初始化一个buffer.*记住:在执行工作的时候,记得首先解锁mappinglock.*/LWLockRelease(newPartitionLock);/*Loophereincasewehavetotryanothervictimbuffer*///循环,寻找合适的bufferfor(;;){/**Ensure,whilethespinlock'snotyetheld,thatthere'safree*refcountentry.*确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目).*/ReservePrivateRefCountEntry();/**Selectavictimbuffer.Thebufferisreturnedwithitsheader*spinlockstillheld!*选择一个待淘汰的buffer.*返回的buffer,仍然持有其header的自旋锁.*/buf=StrategyGetBuffer(strategy,&buf_state);Assert(BUF_STATE_GET_REFCOUNT(buf_state)==0);/*Mustcopybufferflagswhilewestillholdthespinlock*///在仍持有自旋锁的情况下必须拷贝bufferflagsoldFlags=buf_state&BUF_FLAG_MASK;/*Pinthebufferandthenreleasethebufferspinlock*///Pinbuffer,然后释放buffer自旋锁PinBuffer_Locked(buf);/**Ifthebufferwasdirty,trytowriteitout.Thereisarace*conditionhere,inthatsomeonemightdirtyitafterwereleasedit*above,orevenwhilewearewritingitout(sinceourshare-lock*won'tpreventhint-bitupdates).Wewillrecheckthedirtybit*afterre-lockingthebufferheader.*如果buffer已脏,尝试刷新到磁盘上.*这里有一个竞争条件,那就是某些进程可能在我们在上面释放它(或者甚至在我们正在刷新时)之后使该缓冲区变脏.*在再次锁定bufferheader后,我们会重新检查相应的dirty标记位.*/if(oldFlags&BM_DIRTY){/**Weneedashare-lockonthebuffercontentstowriteitout*(elsewemightwriteinvaliddata,egbecausesomeoneelseis*compactingthepagecontentswhilewewrite).Wemustusea*conditionallockacquisitionheretoavoiddeadlock.Even*thoughthebufferwasnotpinned(andthereforesurelynot*locked)whenStrategyGetBufferreturnedit,someoneelsecould*havepinnedandexclusive-lockeditbythetimewegethere.If*wetrytogetthelockunconditionally,we'dblockwaitingfor*them;iftheylaterblockwaitingforus,deadlockensues.*(Thishasbeenobservedtohappenwhentwobackendsareboth*tryingtosplitbtreeindexpages,andthesecondonejust*happenstobetryingtosplitthepagethefirstonegotfrom*StrategyGetBuffer.)*需要持有buffer内容的共享锁来刷出该缓冲区.*(否则的话,我们可能会写入无效的数据,原因比如是其他进程在我们写入时压缩page).*在这里,必须使用条件锁来避免死锁.*在StrategyGetBuffer返回时虽然buffer尚未pinned,*其他进程可能已经pinned该buffer并且同时已持有独占锁.*如果我们尝试无条件的锁定,那么因为等待而阻塞.其他进程稍后又会等待本进程,那么死锁就会发生.*(在实际中,两个后台进程在尝试分裂B树索引pages,*而第二个正好尝试分裂第一个进程通过StrategyGetBuffer获取的page时,会发生这种情况).*/if(LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf),LW_SHARED)){//----执行有条件锁定请求(buffer内容共享锁)/**Ifusinganondefaultstrategy,andwritingthebuffer*wouldrequireaWALflush,letthestrategydecidewhether*togoaheadandwrite/reusethebufferortochooseanother*victim.WeneedlocktoinspectthepageLSN,sothis*can'tbedoneinsideStrategyGetBuffer.*如使用非默认的策略,则写缓冲会请求WALflush,让策略确定如何继续以及写入/重用*缓冲或者选择另外一个待淘汰的buffer.*我们需要锁定,检查page的LSN,因此不能在StrategyGetBuffer中完成.*/if(strategy!=NULL){//非默认策略XLogRecPtrlsn;/*ReadtheLSNwhileholdingbufferheaderlock*///在持有bufferheaderlock时读取LSNbuf_state=LockBufHdr(buf);lsn=BufferGetLSN(buf);UnlockBufHdr(buf,buf_state);if(XLogNeedsFlush(lsn)&&StrategyRejectBuffer(strategy,buf)){//需要flushWAL并且StrategyRejectBuffer/*Droplock/pinandlooparoundforanotherbuffer*///清除lock/pin并循环到另外一个bufferLWLockRelease(BufferDescriptorGetContentLock(buf));UnpinBuffer(buf,true);continue;}}/*OK,dotheI/O*///现在可以执行I/O了TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum,blockNum,smgr->smgr_rnode.node.spcNode,smgr->smgr_rnode.node.dbNode,smgr->smgr_rnode.node.relNode);FlushBuffer(buf,NULL);LWLockRelease(BufferDescriptorGetContentLock(buf));ScheduleBufferTagForWriteback(&BackendWritebackContext,&buf->tag);TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(forkNum,blockNum,smgr->smgr_rnode.node.spcNode,smgr->smgr_rnode.node.dbNode,smgr->smgr_rnode.node.relNode);}else{/**Someoneelsehaslockedthebuffer,sogiveitupandloop*backtogetanotherone.*其他进程已经锁定了buffer,放弃,获取另外一个*/UnpinBuffer(buf,true);continue;}}/**Tochangetheassociationofavalidbuffer,we'llneedtohave*exclusivelockonboththeoldandnewmappingpartitions.*修改有效缓冲区的相关性,需要在原有和新的映射分区上持有独占锁*/if(oldFlags&BM_TAG_VALID){//-----------buffer标记为BM_TAG_VALID/**Needtocomputetheoldtag'shashcodeandpartitionlockID.*XXXisitworthstoringthehashcodeinBufferDescsoweneed*notrecomputeithere?Probablynot.*需要计算原tag的hashcode和partitionlockID.*这里是否值得存储hashcode在BufferDesc中而无需再次计算?可能不值得.*/oldTag=buf->tag;oldHash=BufTableHashCode(&oldTag);oldPartitionLock=BufMappingPartitionLock(oldHash);/**Mustlockthelower-numberedpartitionfirsttoavoid*deadlocks.*必须首先锁定更低一级编号的分区以避免死锁*/if(oldPartitionLock<newPartitionLock){//按顺序锁定LWLockAcquire(oldPartitionLock,LW_EXCLUSIVE);LWLockAcquire(newPartitionLock,LW_EXCLUSIVE);}elseif(oldPartitionLock>newPartitionLock){//按顺序锁定LWLockAcquire(newPartitionLock,LW_EXCLUSIVE);LWLockAcquire(oldPartitionLock,LW_EXCLUSIVE);}else{/*onlyonepartition,onlyonelock*///只有一个分区,只需要一个锁LWLockAcquire(newPartitionLock,LW_EXCLUSIVE);}}else{//-----------buffer未标记为BM_TAG_VALID/*ifitwasn'tvalid,weneedonlythenewpartition*///buffer无效,需要新的分区LWLockAcquire(newPartitionLock,LW_EXCLUSIVE);/*rememberwehavenoold-partitionlockortag*///不需要原有分区的锁&tagoldPartitionLock=NULL;/*thisjustkeepsthecompilerquietaboutuninitvariables*///这行代码的目的是让编译器"闭嘴"oldHash=0;}/**Trytomakeahashtableentryforthebufferunderitsnewtag.*Thiscouldfailbecausewhilewewerewritingsomeoneelse*allocatedanotherbufferforthesameblockwewanttoreadin.*Notethatwehavenotyetremovedthehashtableentryfortheold*tag.*尝试使用buffer新的tag构造hash表入口.*这可能会失败,因为在我们写入时其他进程可能已为我们希望读入的同一个block分配了另外一个buffer.*注意我们还没有删除原有tag的hash表入口.*/buf_id=BufTableInsert(&newTag,newHash,buf->buf_id);if(buf_id>=0){/**Gotacollision.Someonehasalreadydonewhatwewereaboutto*do.We'lljusthandlethisasifitwerefoundinthebuffer*poolinthefirstplace.First,giveupthebufferwewere*planningtouse.*存在冲突.某个进程已完成了我们准备做的事情.*在这里只需要像一开始处理的那样,视为已在缓冲池发现该buffer.*首先,放弃计划使用的buffer.*/UnpinBuffer(buf,true);/*Cangiveupthatbuffer'smappingpartitionlocknow*///放弃原有的partitionlockif(oldPartitionLock!=NULL&&oldPartitionLock!=newPartitionLock)LWLockRelease(oldPartitionLock);/*remainingcodeshouldmatchcodeattopofroutine*///剩余的代码应匹配上面的处理过程//详细参见以上代码注释buf=GetBufferDescriptor(buf_id);valid=PinBuffer(buf,strategy);/*Canreleasethemappinglockassoonaswe'vepinnedit*///是否新partitionlockLWLockRelease(newPartitionLock);//设置标记*foundPtr=true;if(!valid){/**Wecanonlygethereif(a)someoneelseisstillreading*inthepage,or(b)apreviousreadattemptfailed.We*havetowaitforanyactivereadattempttofinish,and*thensetupourownreadattemptifthepageisstillnot*BM_VALID.StartBufferIOdoesitall.*/if(StartBufferIO(buf,true)){/**Ifwegethere,previousattemptstoreadthebuffer*musthavefailed...butweshallbravelytryagain.*/*foundPtr=false;}}returnbuf;}/**Needtolockthebufferheadertooinordertochangeitstag.*需要锁定缓冲头部,目的是修改tag*/buf_state=LockBufHdr(buf);/**Somebodycouldhavepinnedorre-dirtiedthebufferwhilewewere*doingtheI/Oandmakingthenewhashtableentry.Ifso,wecan't*recyclethisbuffer;wemustundoeverythingwe'vedoneandstart*overwithanewvictimbuffer.*在我们执行I/O和标记新的hash表入口时,某些进程可能已经pinned或者重新弄脏了buffer.*如出现这样的情况,不能回收该缓冲区;必须回滚我们所做的所有事情,并重新寻找新的待淘汰的缓冲区.*/oldFlags=buf_state&BUF_FLAG_MASK;if(BUF_STATE_GET_REFCOUNT(buf_state)==1&&!(oldFlags&BM_DIRTY))//已经OK了break;//解锁bufferheaderUnlockBufHdr(buf,buf_state);//删除hash表入口BufTableDelete(&newTag,newHash);//释放锁if(oldPartitionLock!=NULL&&oldPartitionLock!=newPartitionLock)LWLockRelease(oldPartitionLock);LWLockRelease(newPartitionLock);UnpinBuffer(buf,true);//重新寻找buffer}/**Okay,it'sfinallysafetorenamethebuffer.*现在终于可以安全的给buffer重命名了**ClearingBM_VALIDhereisnecessary,clearingthedirtybitsisjust*paranoia.Wealsoresettheusage_countsinceanyrecencyofuseof*theoldcontentisnolongerrelevant.(Theusage_countstartsoutat*1sothatthebuffercansurviveoneclock-sweeppass.)*如需要,清除BM_VALID标记,清除脏标记位.*我们还需要重置usage_count，因为使用旧内容的recency不再相关.*(usage_count从1开始,因此buffer可以在一个时钟周期经过后仍能存活)**MakesureBM_PERMANENTissetforbuffersthatmustbewrittenatevery*checkpoint.Unloggedbuffersonlyneedtobewrittenatshutdown*checkpoints,exceptfortheir"init"forks,whichneedtobetreated*justlikepermanentrelations.*确保标记为BM_PERMANENT的buffer必须在每次checkpoint时刷到磁盘上.*Unlogged缓冲只需要在shutdowncheckpoint时才需要写入,除非它们"init"forks,*这些操作需要类似持久化关系一样处理.*/buf->tag=newTag;buf_state&=~(BM_VALID|BM_DIRTY|BM_JUST_DIRTIED|BM_CHECKPOINT_NEEDED|BM_IO_ERROR|BM_PERMANENT|BUF_USAGECOUNT_MASK);if(relpersistence==RELPERSISTENCE_PERMANENT||forkNum==INIT_FORKNUM)buf_state|=BM_TAG_VALID|BM_PERMANENT|BUF_USAGECOUNT_ONE;elsebuf_state|=BM_TAG_VALID|BUF_USAGECOUNT_ONE;UnlockBufHdr(buf,buf_state);if(oldPartitionLock!=NULL){BufTableDelete(&oldTag,oldHash);if(oldPartitionLock!=newPartitionLock)LWLockRelease(oldPartitionLock);}LWLockRelease(newPartitionLock);/**Buffercontentsarecurrentlyinvalid.Trytogettheio_in_progress*lock.IfStartBufferIOreturnsfalse,thensomeoneelsemanagedto*readitbeforewedid,sothere'snothingleftforBufferAlloc()todo.*缓冲区内存已无效.*尝试获取io_in_progresslock.如StartBufferIO返回F,意味着其他进程已在我们完成前读取该缓冲区,*因此对于BufferAlloc()来说,已无事可做.*/if(StartBufferIO(buf,true))*foundPtr=false;else*foundPtr=true;returnbuf;}三、跟踪分析

测试脚本,查询数据表:

10:01:54(xdb@[local]:5432)testdb=#select*fromt1limit10;

启动gdb,设置断点

(gdb)bBufferAllocBreakpoint1at0x8778ad:filebufmgr.c,line1005.(gdb)cContinuing.Breakpoint1,BufferAlloc(smgr=0x2267430,relpersistence=112'p',forkNum=MAIN_FORKNUM,blockNum=0,strategy=0x0,foundPtr=0x7ffcc97fb4f3)atbufmgr.c:10051005INIT_BUFFERTAG(newTag,smgr->smgr_rnode.node,forkNum,blockNum);(gdb)

输入参数
smgr-SMgrRelationData结构体指针
relpersistence-关系是否持久化
forkNum-fork类型,MAIN_FORKNUM对应数据文件,还有fsm/vm文件
blockNum-块号
strategy-buffer访问策略,为NULL
*foundPtr-输出参数

(gdb)p*smgr$1={smgr_rnode={node={spcNode=1663,dbNode=16402,relNode=51439},backend=-1},smgr_owner=0x7f86133f3778,smgr_targblock=4294967295,smgr_fsm_nblocks=4294967295,smgr_vm_nblocks=4294967295,smgr_which=0,md_num_open_segs={0,0,0,0},md_seg_fds={0x0,0x0,0x0,0x0},next_unowned_reln=0x0}(gdb)p*smgr->smgr_owner$2=(structSMgrRelationData*)0x2267430(gdb)p**smgr->smgr_owner$3={smgr_rnode={node={spcNode=1663,dbNode=16402,relNode=51439},backend=-1},smgr_owner=0x7f86133f3778,smgr_targblock=4294967295,smgr_fsm_nblocks=4294967295,smgr_vm_nblocks=4294967295,smgr_which=0,md_num_open_segs={0,0,0,0},md_seg_fds={0x0,0x0,0x0,0x0},next_unowned_reln=0x0}(gdb)

1.初始化,根据Tag确定hash值和分区锁定ID

(gdb)n1008newHash=BufTableHashCode(&newTag);(gdb)pnewTag$4={rnode={spcNode=1663,dbNode=16402,relNode=51439},forkNum=MAIN_FORKNUM,blockNum=0}(gdb)n1009newPartitionLock=BufMappingPartitionLock(newHash);(gdb)1012LWLockAcquire(newPartitionLock,LW_SHARED);(gdb)1013buf_id=BufTableLookup(&newTag,newHash);(gdb)pnewHash$5=1398580903(gdb)pnewPartitionLock$6=(LWLock*)0x7f85e5db9600(gdb)p*newPartitionLock$7={tranche=59,state={value=536870913},waiters={head=2147483647,tail=2147483647}}(gdb)

2.检查block是否已在buffer pool中

(gdb)n1014if(buf_id>=0)(gdb)pbuf_id$8=-1

4.在缓冲区中找不到该buffer(buf_id < 0)
4.1释放newPartitionLock
4.2执行循环,寻找合适的buffer
4.2.1确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目) —-> ReservePrivateRefCountEntry

(gdb)n1056LWLockRelease(newPartitionLock);(gdb)1065ReservePrivateRefCountEntry();(gdb)

4.2.2选择一个待淘汰的buffer

(gdb)n1071buf=StrategyGetBuffer(strategy,&buf_state);(gdb)n1073Assert(BUF_STATE_GET_REFCOUNT(buf_state)==0);(gdb)pbuf$9=(BufferDesc*)0x7f85e705fd80(gdb)p*buf$10={tag={rnode={spcNode=0,dbNode=0,relNode=0},forkNum=InvalidForkNumber,blockNum=4294967295},buf_id=104,state={value=4194304},wait_backend_pid=0,freeNext=-2,content_lock={tranche=54,state={value=536870912},waiters={head=2147483647,tail=2147483647}}}(gdb)

4.2.3拷贝buffer flags到oldFlags中

(gdb)n1076oldFlags=buf_state&BUF_FLAG_MASK;(gdb)

4.2.4Pin buffer,然后释放buffer自旋锁

(gdb)1079PinBuffer_Locked(buf);(gdb)

4.2.5如buffer标记位BM_DIRTY,FlushBuffer

1088if(oldFlags&BM_DIRTY)(gdb)

4.2.6如buffer标记为BM_TAG_VALID,计算原tag的hashcode和partition lock ID,并锁定新旧分区锁
否则需要新的分区,锁定新分区锁,重置原分区锁和原hash值

(gdb)1166if(oldFlags&BM_TAG_VALID)(gdb)1200LWLockAcquire(newPartitionLock,LW_EXCLUSIVE);(gdb)1202oldPartitionLock=NULL;(gdb)1204oldHash=0;(gdb)poldFlags$11=4194304(gdb)

4.2.7尝试使用buffer新的tag构造hash表入口

(gdb)1214buf_id=BufTableInsert(&newTag,newHash,buf->buf_id);(gdb)n1216if(buf_id>=0)(gdb)pbuf_id$12=-1(gdb)

4.2.9不存在冲突(buf_id < 0),锁定buffer header,如缓冲区没有变脏或者被pinned,则已找到buf,跳出循环
否则,解锁buffer header,删除hash表入口,释放锁,重新寻找buffer

(gdb)n1267buf_state=LockBufHdr(buf);(gdb)1275oldFlags=buf_state&BUF_FLAG_MASK;(gdb)1276if(BUF_STATE_GET_REFCOUNT(buf_state)==1&&!(oldFlags&BM_DIRTY))(gdb)1277break;(gdb)

4.3可以重新设置buffer tag,完成后解锁buffer header,删除原有的hash表入口,释放分区锁

1301buf->tag=newTag;(gdb)1302buf_state&=~(BM_VALID|BM_DIRTY|BM_JUST_DIRTIED|(gdb)1305if(relpersistence==RELPERSISTENCE_PERMANENT||forkNum==INIT_FORKNUM)(gdb)1306buf_state|=BM_TAG_VALID|BM_PERMANENT|BUF_USAGECOUNT_ONE;(gdb)1310UnlockBufHdr(buf,buf_state);(gdb)1312if(oldPartitionLock!=NULL)(gdb)1319LWLockRelease(newPartitionLock);(gdb)p*buf$13={tag={rnode={spcNode=1663,dbNode=16402,relNode=51439},forkNum=MAIN_FORKNUM,blockNum=0},buf_id=104,state={value=2181300225},wait_backend_pid=0,freeNext=-2,content_lock={tranche=54,state={value=536870912},waiters={head=2147483647,tail=2147483647}}}(gdb)

4.4执行StartBufferIO,设置*foundPtr标记

(gdb)1326if(StartBufferIO(buf,true))(gdb)n1327*foundPtr=false;(gdb)

4.5返回buf

(gdb)1331returnbuf;(gdb)1332}(gdb)

执行完成

(gdb)ReadBuffer_common(smgr=0x2267430,relpersistence=112'p',forkNum=MAIN_FORKNUM,blockNum=0,mode=RBM_NORMAL,strategy=0x0,hit=0x7ffcc97fb5eb)atbufmgr.c:747747if(found)(gdb)750pgBufferUsage.shared_blks_read++;(gdb)

到此，关于“PostgreSQL中BufferAlloc函数有什么作用”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注亿速云网站，小编会继续努力为大家带来更多实用的文章！