PostgreSQL中mdread函数有什么作用
本篇内容主要讲解“PostgreSQL中mdread函数有什么作用”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“PostgreSQL中mdread函数有什么作用”吧!
PostgreSQL存储管理的mdread函数是magnetic disk存储管理中负责读取的函数.
一、数据结构smgrsw
f_smgr函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.
md是magnetic disk的缩写.
除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,
但在后面只剩下magnetic disk,其余的已被废弃不再支持.
“magnetic disk”本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.
/**ThisstructoffunctionpointersdefinestheAPIbetweensmgr.cand*anyindividualstoragemanagermodule.Notethatsmgrsubfunctionsare*generallyexpectedtoreportproblemsviaelog(ERROR).Anexceptionis*thatsmgr_unlinkshoulduseelog(WARNING),ratherthanerroringout,*becausewenormallyunlinkrelationsduringpost-commit/abortcleanup,*andsoit'stoolatetoraiseanerror.Also,variousconditionsthat*wouldnormallybeerrorsshouldbeallowedduringbootstrapand/orWAL*recovery---seecommentsinmd.cfordetails.*函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.*注意smgr子函数通常会通过elog(ERROR)报告错误.*其中一个例外是smgr_unlink应该使用elog(WARNING),而不是把错误抛出,*因为通过来说在事务提交/回滚清理期间才会解链接(unlinke)关系,*因此这时候抛出错误就显得太晚了.*同时,在bootstrap和/或WAL恢复期间,各种可能会出现错误的情况也应被允许---详细可查看md.c中的注释.*/typedefstructf_smgr{void(*smgr_init)(void);/*maybeNULL*/void(*smgr_shutdown)(void);/*maybeNULL*/void(*smgr_close)(SMgrRelationreln,ForkNumberforknum);void(*smgr_create)(SMgrRelationreln,ForkNumberforknum,boolisRedo);bool(*smgr_exists)(SMgrRelationreln,ForkNumberforknum);void(*smgr_unlink)(RelFileNodeBackendrnode,ForkNumberforknum,boolisRedo);void(*smgr_extend)(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum,char*buffer,boolskipFsync);void(*smgr_prefetch)(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum);void(*smgr_read)(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum,char*buffer);void(*smgr_write)(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum,char*buffer,boolskipFsync);void(*smgr_writeback)(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum,BlockNumbernblocks);BlockNumber(*smgr_nblocks)(SMgrRelationreln,ForkNumberforknum);void(*smgr_truncate)(SMgrRelationreln,ForkNumberforknum,BlockNumbernblocks);void(*smgr_immedsync)(SMgrRelationreln,ForkNumberforknum);void(*smgr_pre_ckpt)(void);/*maybeNULL*/void(*smgr_sync)(void);/*maybeNULL*/void(*smgr_post_ckpt)(void);/*maybeNULL*/}f_smgr;/*md是magneticdisk的缩写.除了md,先前PG还支持SonyWORMopticaldiskjukeboxandpersistentmainmemory这两种存储方式,但在后面只剩下magneticdisk,其余的已被废弃不再支持."magneticdisk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.*/staticconstf_smgrsmgrsw[]={/*magneticdisk*/{.smgr_init=mdinit,.smgr_shutdown=NULL,.smgr_close=mdclose,.smgr_create=mdcreate,.smgr_exists=mdexists,.smgr_unlink=mdunlink,.smgr_extend=mdextend,.smgr_prefetch=mdprefetch,.smgr_read=mdread,.smgr_write=mdwrite,.smgr_writeback=mdwriteback,.smgr_nblocks=mdnblocks,.smgr_truncate=mdtruncate,.smgr_immedsync=mdimmedsync,.smgr_pre_ckpt=mdpreckpt,.smgr_sync=mdsync,.smgr_post_ckpt=mdpostckpt}};
MdfdVec
magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符.
之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.
为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的”segment”文件.
段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.
/**Themagneticdiskstoragemanagerkeepstrackofopenfile*descriptorsinitsowndescriptorpool.Thisisdonetomakeit*easiertosupportrelationsthatarelargerthantheoperating*system'sfilesizelimit(often2GBytes).Inordertodothat,*webreakrelationsupinto"segment"filesthatareeachshorterthan*theOSfilesizelimit.ThesegmentsizeissetbytheRELSEG_SIZE*configurationconstantinpg_config.h.*magneticdisk存储管理在自己的描述符池中跟踪打开的文件描述符.*之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.*为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件.*段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.**Ondisk,arelationmustconsistofconsecutivelynumberedsegment*filesinthepattern*--ZeroormorefullsegmentsofexactlyRELSEG_SIZEblockseach*--Exactlyonepartialsegmentofsize0<=size<RELSEG_SIZEblocks*--Optionally,anynumberofinactivesegmentsofsize0blocks.*Thefullandpartialsegmentsarecollectivelythe"active"segments.*Inactivesegmentsarethosethatoncecontaineddatabutarecurrently*notneededbecauseofanmdtruncate()operation.Thereasonforleaving*thempresentatsizezero,ratherthanunlinkingthem,isthatother*backendsand/orthecheckpointermightbeholdingopenfilereferencesto*suchsegments.Iftherelationexpandsagainaftermdtruncate(),such*thatadeactivatedsegmentbecomesactiveagain,itisimportantthat*suchfilereferencesstillbevalid---elsedatamightgetwritten*outtoanunlinkedoldcopyofasegmentfilethatwilleventually*disappear.*在磁盘上,关系必须由按照某种模式连续编号的segmentfiles组成.*--每个RELSEG_SIZE块的另段或多个完整段*--大小满足0<=size<RELSEG_SIZEblocks的一个部分段*--可选的,大小为0blocks的N个非活动段*完整和部分段统称为活动段.非活动段指的是哪些因为mdtruncate()操作而出现的包含数据但目前不需要的.*保留这些大小为0的非活动段而不是unlinking的原因是其他进程和/或checkpointer进程可能*持有这些段的文件依赖.*如果关系在mdtruncate()之后再次扩展了,这样一个无效的会重新变为活动段,*因此文件依赖仍然保持有效是很重要的*---否则数据可能写出到未经链接的旧segmentfile拷贝上,会时不时的出现数据丢失.**Filedescriptorsarestoredintheper-forkmd_seg_fdsarraysinside*SMgrRelation.Thelengthofthesearraysisstoredinmd_num_open_segs.*Notethatafork'smd_num_open_segshavingaspecificvaluedoesnot*necessarilymeantherelationdoesn'thaveadditionalsegments;wemay*justnothaveopenedthenextsegmentyet.(Wecouldnothave"all*segmentsareinthearray"asaninvariantanyway,sinceanotherbackend*couldextendtherelationwhilewearen'tlooking.)Wedonothave*entriesforinactivesegments,however;assoonaswefindapartial*segment,weassumethatanysubsequentsegmentsareinactive.*文件描述符在SMgrRelation中的per-forkmd_seg_fds数组存储.*这些数组的长度存储在md_num_open_segs中.*注意一个fork的md_num_open_segs有一个特定值并不必要意味着关系不能有额外的段,*我们只是还没有打开下一个段而已.*(但不管怎样,我们不可能把"所有段都放在数组中"作为一个不变式看待,*因为其他后台进程在尚未检索时已经扩展了关系)*但是,我们不需要持有非活动段的条目,只要我们一旦发现部分段,那么就可以假定接下来的段是非活动的.**TheentireMdfdVecarrayispalloc'dintheMdCxtmemorycontext.*整个MdfdVec数组通过palloc在MdCxt内存上下文中分配.*/typedefstruct_MdfdVec{//文件描述符池中该文件的编号Filemdfd_vfd;/*fdnumberinfd.c'spool*///段号,从0起算BlockNumbermdfd_segno;/*segmentnumber,from0*/}MdfdVec;二、源码解读
mdread() — 从relation中读取相应的block.
源码较为简单,主要是调用FileRead函数执行实际的读取操作.
/**mdread()--Readthespecifiedblockfromarelation.*mdread()--从relation中读取相应的block*/voidmdread(SMgrRelationreln,ForkNumberforknum,BlockNumberblocknum,char*buffer){off_tseekpos;//seek的位置intnbytes;//bytesMdfdVec*v;//md文件描述符向量数组TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum,blocknum,reln->smgr_rnode.node.spcNode,reln->smgr_rnode.node.dbNode,reln->smgr_rnode.node.relNode,reln->smgr_rnode.backend);//获取向量数组v=_mdfd_getseg(reln,forknum,blocknum,false,EXTENSION_FAIL|EXTENSION_CREATE_RECOVERY);//获取block偏移seekpos=(off_t)BLCKSZ*(blocknum%((BlockNumber)RELSEG_SIZE));//验证Assert(seekpos<(off_t)BLCKSZ*RELSEG_SIZE);//读取文件,读入buffer中,返回读取的字节数nbytes=FileRead(v->mdfd_vfd,buffer,BLCKSZ,seekpos,WAIT_EVENT_DATA_FILE_READ);//跟踪TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum,blocknum,reln->smgr_rnode.node.spcNode,reln->smgr_rnode.node.dbNode,reln->smgr_rnode.node.relNode,reln->smgr_rnode.backend,nbytes,BLCKSZ);if(nbytes!=BLCKSZ){//读取的字节数不等于块大小,报错if(nbytes<0)ereport(ERROR,(errcode_for_file_access(),errmsg("couldnotreadblock%uinfile\"%s\":%m",blocknum,FilePathName(v->mdfd_vfd))));/**Shortread:weareatorpastEOF,orwereadapartialblockat*EOF.Normallythisisanerror;upperlevelsshouldnevertryto*readanonexistentblock.However,ifzero_damaged_pagesisONor*weareInRecovery,weshouldinsteadreturnzeroeswithout*complaining.Thisallows,forexample,thecaseoftryingto*updateablockthatwaslatertruncatedaway.*Shortread:处于EOF或者在EOF之后,或者在EOF处读取了一个部分块.*通常来说,这是一个错误,高层代码不应尝试读取一个不存在的block.*但是,如果zero_damaged_pages参数设置为ON或者处于InRecovery状态,那么应该返回0而不报错.*比如,这可以允许尝试更新一个块但随后就给截断的情况.*/if(zero_damaged_pages||InRecovery)MemSet(buffer,0,BLCKSZ);elseereport(ERROR,(errcode(ERRCODE_DATA_CORRUPTED),errmsg("couldnotreadblock%uinfile\"%s\":readonly%dof%dbytes",blocknum,FilePathName(v->mdfd_vfd),nbytes,BLCKSZ)));}}三、跟踪分析
测试脚本
11:15:11(xdb@[local]:5432)testdb=#insertintot1(id)selectgenerate_series(100,500);
启动gdb,跟踪
查看调用栈
(gdb)bmdreadBreakpoint3at0x8b669b:filemd.c,line738.(gdb)cContinuing.Breakpoint3,mdread(reln=0x2d09be0,forknum=MAIN_FORKNUM,blocknum=50,buffer=0x7f3823369c00"")atmd.c:738738TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum,blocknum,(gdb)bt#0mdread(reln=0x2d09be0,forknum=MAIN_FORKNUM,blocknum=50,buffer=0x7f3823369c00"")atmd.c:738#10x00000000008b92d5insmgrread(reln=0x2d09be0,forknum=MAIN_FORKNUM,blocknum=50,buffer=0x7f3823369c00"")atsmgr.c:628#20x00000000008793f9inReadBuffer_common(smgr=0x2d09be0,relpersistence=112'p',forkNum=MAIN_FORKNUM,blockNum=50,mode=RBM_NORMAL,strategy=0x0,hit=0x7ffd5fb2948b)atbufmgr.c:890#30x0000000000878cd4inReadBufferExtended(reln=0x7f3836e1e788,forkNum=MAIN_FORKNUM,blockNum=50,mode=RBM_NORMAL,strategy=0x0)atbufmgr.c:664#40x0000000000878bb1inReadBuffer(reln=0x7f3836e1e788,blockNum=50)atbufmgr.c:596#50x00000000004eeb96inReadBufferBI(relation=0x7f3836e1e788,targetBlock=50,bistate=0x0)athio.c:87#60x00000000004ef387inRelationGetBufferForTuple(relation=0x7f3836e1e788,len=32,otherBuffer=0,options=0,bistate=0x0,vmbuffer=0x7ffd5fb295ec,vmbuffer_other=0x0)athio.c:415#70x00000000004df1f8inheap_insert(relation=0x7f3836e1e788,tup=0x2ca6770,cid=0,options=0,bistate=0x0)atheapam.c:2468#80x0000000000709ddainExecInsert(mtstate=0x2ca4c40,slot=0x2ca3418,planSlot=0x2ca3418,estate=0x2ca48d8,canSetTag=true)atnodeModifyTable.c:529#90x000000000070c475inExecModifyTable(pstate=0x2ca4c40)atnodeModifyTable.c:2159#100x00000000006e05cbinExecProcNodeFirst(node=0x2ca4c40)atexecProcnode.c:445#110x00000000006d552einExecProcNode(node=0x2ca4c40)at../../../src/include/executor/executor.h:247#120x00000000006d7d66inExecutePlan(estate=0x2ca48d8,planstate=0x2ca4c40,use_parallel_mode=false,operation=CMD_INSERT,sendTuples=false,numberTuples=0,direction=ForwardScanDirection,dest=0x2d41a30,execute_once=true)atexecMain.c:1723#130x00000000006d5af8instandard_ExecutorRun(queryDesc=0x2ca24b8,direction=ForwardScanDirection,count=0,execute_once=true)atexecMain.c:364#140x00000000006d5920inExecutorRun(queryDesc=0x2ca24b8,direction=ForwardScanDirection,count=0,execute_once=true)atexecMain.c:307#150x00000000008c1092inProcessQuery(plan=0x2d418b8,sourceText=0x2c7eec8"insertintot1(id)selectgenerate_series(100,500);",params=0x0,queryEnv=0x0,dest=0x2d41a30,---Type<return>tocontinue,orq<return>toquit---completionTag=0x7ffd5fb29b80"")atpquery.c:161#160x00000000008c29a1inPortalRunMulti(portal=0x2ce4488,isTopLevel=true,setHoldSnapshot=false,dest=0x2d41a30,altdest=0x2d41a30,completionTag=0x7ffd5fb29b80"")atpquery.c:1286#170x00000000008c1f7ainPortalRun(portal=0x2ce4488,count=9223372036854775807,isTopLevel=true,run_once=true,dest=0x2d41a30,altdest=0x2d41a30,completionTag=0x7ffd5fb29b80"")atpquery.c:799#180x00000000008bbf16inexec_simple_query(query_string=0x2c7eec8"insertintot1(id)selectgenerate_series(100,500);")atpostgres.c:1145#190x00000000008c01a1inPostgresMain(argc=1,argv=0x2ca8af8,dbname=0x2ca8960"testdb",username=0x2c7bba8"xdb")atpostgres.c:4182#200x000000000081e07cinBackendRun(port=0x2ca0940)atpostmaster.c:4361#210x000000000081d7efinBackendStartup(port=0x2ca0940)atpostmaster.c:4033#220x0000000000819be9inServerLoop()atpostmaster.c:1706#230x000000000081949finPostmasterMain(argc=1,argv=0x2c79b60)atpostmaster.c:1379#240x0000000000742941inmain(argc=1,argv=0x2c79b60)atmain.c:228(gdb)
获取读取的偏移
(gdb)n744v=_mdfd_getseg(reln,forknum,blocknum,false,(gdb)747seekpos=(off_t)BLCKSZ*(blocknum%((BlockNumber)RELSEG_SIZE));(gdb)p*v$1={mdfd_vfd=26,mdfd_segno=0}(gdb)pBLCKSZ$2=8192(gdb)pblocknum$3=50(gdb)pRELSEG_SIZE$4=131072(gdb)n749Assert(seekpos<(off_t)BLCKSZ*RELSEG_SIZE);(gdb)pseekpos$5=409600(gdb)
执行读取操作
(gdb)n751if(FileSeek(v->mdfd_vfd,seekpos,SEEK_SET)!=seekpos)(gdb)757nbytes=FileRead(v->mdfd_vfd,buffer,BLCKSZ,WAIT_EVENT_DATA_FILE_READ);(gdb)759TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum,blocknum,(gdb)pnbytes$6=8192(gdb)p*buffer$7=1'\001'(gdb)n767if(nbytes!=BLCKSZ)(gdb)792}(gdb)smgrread(reln=0x2d09be0,forknum=MAIN_FORKNUM,blocknum=50,buffer=0x7f3823369c00"\001")atsmgr.c:629629}(gdb)
到此,相信大家对“PostgreSQL中mdread函数有什么作用”有了更深的了解,不妨来实际操作一番吧!这里是亿速云网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。