分析PostgreSQL中的数据结构HTAB
这篇文章主要讲解了“分析PostgreSQL中的数据结构HTAB”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“分析PostgreSQL中的数据结构HTAB”吧!
一、数据结构/**Topcontrolstructureforahashtable---inasharedtable,eachbackend*hasitsowncopy(OKsincenofieldschangeatruntime)*哈希表的顶层控制结构.*在这个共享哈希表中,每一个后台进程都有自己的拷贝*(之所以没有问题是因为fork出来后,在运行期没有字段会变化)*/structHTAB{//指向共享的控制信息HASHHDR*hctl;/*=>sharedcontrolinformation*///段目录HASHSEGMENT*dir;/*directoryofsegmentstarts*///哈希函数HashValueFunchash;/*hashfunction*///哈希键比较函数HashCompareFuncmatch;/*keycomparisonfunction*///哈希键拷贝函数HashCopyFunckeycopy;/*keycopyingfunction*///内存分配器HashAllocFuncalloc;/*memoryallocator*///内存上下文MemoryContexthcxt;/*memorycontextifdefaultallocatorused*///表名(用于错误信息)char*tabname;/*tablename(forerrormessages)*///如在共享内存中,则为Tboolisshared;/*trueiftableisinsharedmemory*///如为T,则固定大小不能扩展boolisfixed;/*iftrue,don'tenlarge*//*freezingasharedtableisn'tallowed,sowecankeepstatehere*///不允许冻结共享表,因此这里会保存相关状态boolfrozen;/*true=nomoreinsertsallowed*//*Wekeeplocalcopiesofthesefixedvaluestoreducecontention*///保存这些固定值的本地拷贝,以减少冲突//哈希键长度(以字节为单位)Sizekeysize;/*hashkeylengthinbytes*///段大小,必须为2的幂longssize;/*segmentsize---mustbepowerof2*///段偏移,ssize的对数intsshift;/*segmentshift=log2(ssize)*/};/**Headerstructureforahashtable---containsallchangeableinfo*哈希表的头部结构--存储所有可变信息**Inashared-memoryhashtable,theHASHHDRisinsharedmemory,while*eachbackendhasalocalHTABstruct.Foranon-sharedtable,thereisn't*anyfunctionaldifferencebetweenHASHHDRandHTAB,butweseparatethem*anywaytosharecodebetweensharedandnon-sharedtables.*在共享内存哈希表中,HASHHDR位于共享内存中,每一个后台进程都有一个本地HTAB结构.*对于非共享哈希表,HASHHDR和HTAB没有任何功能性的不同,*但无论如何,我们还是把它们区分为共享和非共享表.*/structHASHHDR{/**Thefreelistcanbecomeapointofcontentioninhigh-concurrencyhash*tables,soweuseanarrayoffreelists,eachwithitsownmutexand*nentriescount,insteadofjustasingleone.Althoughthefreelists*normallyoperateindependently,wewillscavengeentriesfromfreelists*otherthanahashcode'sdefaultfreelistwhennecessary.*在高并发的哈希表中,空闲链表会成为竞争热点,因此我们使用空闲链表数组,*数组中的每一个元素都有自己的mutex和条目统计,而不是使用一个.**Ifthehashtableisnotpartitioned,onlyfreeList[0]isusedandits*spinlockisnotusedatall;callers'lockingisassumedsufficient.*如果哈希表没有分区,那么只有freelist[0]元素是有用的,自旋锁没有任何用处;*调用者锁定被认为已足够OK.*//*Numberoffreeliststobeusedforapartitionedhashtable.*///#defineNUM_FREELISTS32FreeListDatafreeList[NUM_FREELISTS];/*Thesefieldscanchange,butnotinapartitionedtable*///这些域字段可以改变,但不适用于分区表/*Also,dsizecan'tchangeinasharedtable,evenifunpartitioned*///同时,就算是非分区表,共享表的dsize也不能改变//目录大小longdsize;/*directorysize*///已分配的段大小(<=dsize)longnsegs;/*numberofallocatedsegments(<=dsize)*///正在使用的最大桶IDuint32max_bucket;/*IDofmaximumbucketinuse*///进入整个哈希表的模掩码uint32high_mask;/*masktomodulointoentiretable*///进入低位哈希表的模掩码uint32low_mask;/*masktomodulointolowerhalfoftable*//*Thesefieldsarefixedathashtablecreation*///下面这些字段在哈希表创建时已固定//哈希键大小(以字节为单位)Sizekeysize;/*hashkeylengthinbytes*///所有用户元素大小(以字节为单位)Sizeentrysize;/*totaluserelementsizeinbytes*///分区个数(2的幂),或者为0longnum_partitions;/*#partitions(mustbepowerof2),or0*///目标的填充因子longffactor;/*targetfillfactor*///如目录是固定大小,则该值为dsize的上限值longmax_dsize;/*'dsize'limitifdirectoryisfixedsize*///段大小,必须是2的幂longssize;/*segmentsize---mustbepowerof2*///段偏移,ssize的对数intsshift;/*segmentshift=log2(ssize)*///一次性分配的条目个数intnelem_alloc;/*numberofentriestoallocateatonce*/#ifdefHASH_STATISTICS/**Countstatisticshere.NB:statscodedoesn'tbotherwithmutex,so*countscouldbecorruptedabitinapartitionedtable.*统计信息.*注意:统计相关的代码不会影响mutex,因此对于分区表,统计可能有一点点问题*/longaccesses;longcollisions;#endif};/**Per-freelistdata.*空闲链表数据.**Inapartitionedhashtable,eachfreelistisassociatedwithaspecific*setofhashcodes,asdeterminedbytheFREELIST_IDX()macrobelow.*nentriestracksthenumberoflivehashtableentrieshavingthosehashcodes*(NOTthenumberofentriesinthefreelist,asyoumightexpect).*在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义.*nentries跟踪有这些hashcodes的仍存活的hashtable条目个数.*(注意不要搞错,不是空闲的条目个数)**Thecoverageofafreelistmightbemoreorlessthanonepartition,soit*needsitsownlockratherthanrelyingoncallerlocking.Relyingonthat*wouldn'tworkevenifthecoveragewasthesame,becauseoftheoccasional*needto"borrow"entriesfromanotherfreelist;seeget_hash_entry().*空闲链表的覆盖范围可能比一个分区多或少,因此需要自己的锁而不能仅仅依赖调用者的锁.*依赖调用者锁在覆盖面一样的情况下也不会起效,因为偶尔需要从另一个自由列表“借用”条目,详细参见get_hash_entry()**UsinganarrayofFreeListDatainsteadofseparatearraysofmutexes,*nentriesandfreeListshelpstoreducesharingofcachelinesbetween*differentmutexes.*使用FreeListData数组而不是一个独立的mutexes,nentries和freelists数组有助于减少不同mutexes之间的缓存线共享.*/typedefstruct{//该空闲链表的自旋锁slock_tmutex;/*spinlockforthisfreelist*///相关桶中的条目个数longnentries;/*numberofentriesinassociatedbuckets*///空闲元素链HASHELEMENT*freeList;/*chainoffreeelements*/}FreeListData;/**HASHELEMENTistheprivatepartofahashtableentry.Thecaller'sdata*followstheHASHELEMENTstructure(onaMAXALIGN'dboundary).Thehashkey*isexpectedtobeatthestartofthecaller'shashentrydatastructure.*HASHELEMENT是哈希表条目的私有部分.*调用者的数据按照HASHELEMENT结构组织(位于MAXALIGN的边界).*哈希键应位于调用者hash条目数据结构的开始位置.*/typedefstructHASHELEMENT{//链接到相同桶中的下一个条目structHASHELEMENT*link;/*linktonextentryinsamebucket*///该条目的哈希函数结果uint32hashvalue;/*hashfunctionresultforthisentry*/}HASHELEMENT;/*Hashtableheaderstructisanopaquetypeknownonlywithindynahash.c*///哈希表头部结构,非透明类型,用于dynahash.ctypedefstructHASHHDRHASHHDR;/*Hashtablecontrolstructisanopaquetypeknownonlywithindynahash.c*///哈希表控制结构,非透明类型,用于dynahash.ctypedefstructHTABHTAB;/*Parameterdatastructureforhash_create*///hash_create使用的参数数据结构/*Onlythosefieldsindicatedbyhash_flagsneedbeset*///根据hash_flags标记设置相应的字段typedefstructHASHCTL{//分区个数(必须是2的幂)longnum_partitions;/*#partitions(mustbepowerof2)*///段大小longssize;/*segmentsize*///初始化目录大小longdsize;/*(initial)directorysize*///dsize上限longmax_dsize;/*limittodsizeifdirsizeislimited*///填充因子longffactor;/*fillfactor*///哈希键大小(字节为单位)Sizekeysize;/*hashkeylengthinbytes*///参见上述数据结构注释Sizeentrysize;/*totaluserelementsizeinbytes*///HashValueFunchash;/*hashfunction*/HashCompareFuncmatch;/*keycomparisonfunction*/HashCopyFunckeycopy;/*keycopyingfunction*/HashAllocFuncalloc;/*memoryallocator*/MemoryContexthcxt;/*memorycontexttouseforallocations*///共享内存中的哈希头部结构地址HASHHDR*hctl;/*locationofheaderinsharedmem*/}HASHCTL;/*AhashbucketisalinkedlistofHASHELEMENTs*///哈希桶是HASHELEMENTs链表typedefHASHELEMENT*HASHBUCKET;/*Ahashsegmentisanarrayofbucketheaders*///hashsegment是桶数组typedefHASHBUCKET*HASHSEGMENT;/**Hashfunctionsmusthavethissignature.*Hash函数必须有它自己的标识*/typedefuint32(*HashValueFunc)(constvoid*key,Sizekeysize);/**Keycomparisonfunctionsmusthavethissignature.Comparisonfunctions*returnzeroformatch,nonzerofornomatch.(Thecomparisonfunction*definitionisdesignedtoallowmemcmp()andstrncmp()tobeuseddirectly*askeycomparisonfunctions.)*哈希键对比函数必须有自己的标识.*如匹配则对比函数返回0,不匹配返回非0.*(对比函数定义被设计为允许在对比键值时可直接使用memcmp()和strncmp())*/typedefint(*HashCompareFunc)(constvoid*key1,constvoid*key2,Sizekeysize);/**Keycopyingfunctionsmusthavethissignature.Thereturnvalueisnot*used.(Thedefinitionissetuptoallowmemcpy()andstrlcpy()tobe*useddirectly.)*键拷贝函数必须有自己的标识.*返回值无用.*/typedefvoid*(*HashCopyFunc)(void*dest,constvoid*src,Sizekeysize);/**Spaceallocationfunctionforahashtable---designedtomatchmalloc().*Note:thereisnofreefunctionAPI;can'tdestroyahashtableunlessyou*usethedefaultallocator.*哈希表的恐惧分配函数--被设计为与malloc()函数匹配.*注意:这里没有释放函数API;不能销毁哈希表,除非使用默认的分配器.*/typedefvoid*(*HashAllocFunc)(Sizerequest);
其结构如下图所示:
测试脚本
\psetfooteroff\psettuples_onlyon\o/tmp/drop.sqlSELECT'droptableifexiststbl'||id||';'as"--"FROMgenerate_series(1,20000)ASid;\i/tmp/drop.sql\psetfooteroff\psettuples_onlyon\o/tmp/create.sqlSELECT'CREATETABLEtbl'||id||'(idint);'as"--"FROMgenerate_series(1,10000)ASid;begin;\o/tmp/ret.txt\i/tmp/create.sql
跟踪分析
...HASHSEGMENT*dir-->HASHELEMENT***dir;dir-->HASHELEMENT***(gdb)p*hctl$1={freeList={{mutex=0'\000',nentries=312,freeList=0x7fd906ab84c0},{mutex=0'\000',nentries=298,freeList=0x7fd907097c40},{mutex=0'\000',nentries=292,freeList=0x7fd906ac2520},{mutex=0'\000',nentries=321,freeList=0x7fd906ac8120},{mutex=0'\000',nentries=341,freeList=0x7fd907229980},{mutex=0'\000',nentries=334,freeList=0x7fd906ad3f08},{mutex=0'\000',nentries=316,freeList=0x7fd906ad6fb8},{mutex=0'\000',nentries=299,freeList=0x7fd906ade550},{mutex=0'\000',nentries=328,freeList=0x7fd906ae1600},{mutex=0'\000',nentries=328,freeList=0x7fd906ae62e8},{mutex=0'\000',nentries=308,freeList=0x7fd906aeb660},{mutex=0'\000',nentries=327,freeList=0x7fd90706f338},{mutex=0'\000',nentries=346,freeList=0x7fd906af6bc0},{mutex=0'\000',nentries=323,freeList=0x7fd907237bc0},{mutex=0'\000',nentries=304,freeList=0x7fd9071ddb40},{mutex=0'\000',nentries=311,freeList=0x7fd906b06238},{mutex=0'\000',nentries=292,freeList=0x7fd90707b620},{mutex=0'\000',nentries=303,freeList=0x7fd90723dd20},{mutex=0'\000',nentries=302,freeList=0x7fd906b137e0},{mutex=0'\000',nentries=307,freeList=0x7fd9070873c8},{mutex=0'\000',nentries=314,freeList=0x7fd90723bb68},{mutex=0'\000',nentries=279,freeList=0x7fd906b22678},{mutex=0'\000',nentries=297,freeList=0x7fd907073e08},{mutex=0'\000',nentries=309,freeList=0x7fd90721f888},{mutex=0'\000',nentries=317,freeList=0x7fd906b33880},{mutex=0'\000',nentries=283,freeList=0x7fd907086168},{mutex=0'\000',nentries=331,freeList=0x7fd906b3d838},{mutex=0'\000',nentries=330,freeList=0x7fd906b41f38},{mutex=0'\000',nentries=313,freeList=0x7fd906b46440},{mutex=0'\000',nentries=304,freeList=0x7fd906b4b5c0},{mutex=0'\000',nentries=310,freeList=0x7fd90720ed80},{mutex=0'\000',nentries=323,freeList=0x7fd906b575a0}},dsize=256,nsegs=16,max_bucket=4095,high_mask=8191,low_mask=4095,keysize=16,entrysize=152,num_partitions=16,ffactor=1,max_dsize=256,ssize=256,sshift=8,nelem_alloc=48}(gdb)p*hashp$2={hctl=0x7fd906aae980,dir=0x7fd906aaecd8,hash=0xa79ac6<tag_hash>,match=0x47cb70<memcmp@plt>,keycopy=0x47d0a0<memcpy@plt>,alloc=0x8c3419<ShmemAllocNoError>,hcxt=0x0,tabname=0x160f1d0"LOCKhash",isshared=true,isfixed=false,frozen=false,keysize=16,ssize=256,sshift=8}(gdb)p*hashp->dir$3=(HASHSEGMENT)0x7fd906aaf500(gdb)phashp->dir$4=(HASHSEGMENT*)0x7fd906aaecd8(gdb)p**hashp->dir$5=(HASHBUCKET)0x7fd907212dd0(gdb)p***hashp->dir$6={link=0x7fd9071a7b90,hashvalue=1748602880}(gdb)n949if(action==HASH_ENTER||action==HASH_ENTER_NULL)(gdb)956if(!IS_PARTITIONED(hctl)&&!hashp->frozen&&(gdb)965bucket=calc_bucket(hctl,hashvalue);-->hash桶(gdb)967segment_num=bucket>>hashp->sshift;-->桶号右移8位得到段号(gdb)968segment_ndx=MOD(bucket,hashp->ssize);-->桶号取模得到段内偏移(gdb)970segp=hashp->dir[segment_num];-->获取段(HASHELEMENT**)(gdb)972if(segp==NULL)(gdb)pbucket$7=2072(gdb)psegment_num$8=8(gdb)psegment_ndx$9=24(gdb)psegp-->$10=(HASHSEGMENT)0x7fd906ab3500(gdb)(gdb)n975prevBucketPtr=&segp[segment_ndx];-->HASHELEMENT**(gdb)976currBucket=*prevBucketPtr;-->HASHELEMENT*(gdb)981match=hashp->match;/*saveonefetchininnerloop*/(gdb)pprevBucketPtr$12=(HASHBUCKET*)0x7fd906ab35c0(gdb)pcurrBucket$13=(HASHBUCKET)0x7fd90714da68(gdb)
感谢各位的阅读,以上就是“分析PostgreSQL中的数据结构HTAB”的内容了,经过本文的学习后,相信大家对分析PostgreSQL中的数据结构HTAB这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。