PostgreSQL中用于计算merge join的Cost函数有哪些

2025-03-11 技术教程

本篇内容主要讲解“PostgreSQL中用于计算merge join的Cost函数有哪些”，感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷，实用性强。下面就让小编来带大家学习“PostgreSQL中用于计算merge join的Cost函数有哪些”吧!

一、数据结构

Cost相关
注意:实际使用的参数值通过系统配置文件定义,而不是这里的常量定义!

typedefdoubleCost;/*executioncost(inpage-accessunits)*//*defaultsforcostsize.c'sCostparameters*//*NB:cost-estimationcodeshouldusethevariables,nottheseconstants!*//*注意:实际值通过系统配置文件定义,而不是这里的常量定义!*//*Ifyouchangethese,updatebackend/utils/misc/postgresql.sample.conf*/#defineDEFAULT_SEQ_PAGE_COST1.0//顺序扫描page的成本#defineDEFAULT_RANDOM_PAGE_COST4.0//随机扫描page的成本#defineDEFAULT_CPU_TUPLE_COST0.01//处理一个元组的CPU成本#defineDEFAULT_CPU_INDEX_TUPLE_COST0.005//处理一个索引元组的CPU成本#defineDEFAULT_CPU_OPERATOR_COST0.0025//执行一次操作或函数的CPU成本#defineDEFAULT_PARALLEL_TUPLE_COST0.1//并行执行,从一个worker传输一个元组到另一个worker的成本#defineDEFAULT_PARALLEL_SETUP_COST1000.0//构建并行执行环境的成本#defineDEFAULT_EFFECTIVE_CACHE_SIZE524288/*先前已有介绍,measuredinpages*/doubleseq_page_cost=DEFAULT_SEQ_PAGE_COST;doublerandom_page_cost=DEFAULT_RANDOM_PAGE_COST;doublecpu_tuple_cost=DEFAULT_CPU_TUPLE_COST;doublecpu_index_tuple_cost=DEFAULT_CPU_INDEX_TUPLE_COST;doublecpu_operator_cost=DEFAULT_CPU_OPERATOR_COST;doubleparallel_tuple_cost=DEFAULT_PARALLEL_TUPLE_COST;doubleparallel_setup_cost=DEFAULT_PARALLEL_SETUP_COST;inteffective_cache_size=DEFAULT_EFFECTIVE_CACHE_SIZE;Costdisable_cost=1.0e10;//1后面10个0,通过设置一个巨大的成本,让优化器自动放弃此路径intmax_parallel_workers_per_gather=2;//每次gather使用的worker数二、源码解读

merge join算法实现的伪代码如下:
READ data_set_1 SORT BY JOIN KEY TO temp_ds1
READ data_set_2 SORT BY JOIN KEY TO temp_ds2
READ ds1_row FROM temp_ds1
READ ds2_row FROM temp_ds2
WHILE NOT eof ON temp_ds1,temp_ds2 LOOP
IF ( temp_ds1.key = temp_ds2.key ) OUTPUT JOIN ds1_row,ds2_row
ELSIF ( temp_ds1.key <= temp_ds2.key ) READ ds1_row FROM temp_ds1
ELSIF ( temp_ds1.key => temp_ds2.key ) READ ds2_row FROM temp_ds2
END LOOP

try_mergejoin_path->initial_cost_mergejoin
该函数用于实现merge join路径成本的初步估计。

//-----------------------try_mergejoin_path->initial_cost_mergejoin/**initial_cost_mergejoin*Preliminaryestimateofthecostofamergejoinpath.*mergejoin路径成本的初步估计。**Thismustquicklyproducelower-boundestimatesofthepath'sstartupand*totalcosts.Ifweareunabletoeliminatetheproposedpathfrom*considerationusingthelowerbounds,final_cost_mergejoinwillbecalled*toobtainthefinalestimates.*必须快速的对path启动和总成本做出较低的估算。*如果无法使用下界消除尝试对建议的路径进行比较，那么将调用final_cost_mergejoin来获得最终估计。**Theexactdivisionoflaborbetweenthisfunctionandfinal_cost_mergejoin*isprivatetothem,andrepresentsatradeoffbetweenspeedoftheinitial*estimateandgettingatightlowerbound.Wechoosetonotexaminethe*joinqualshere,exceptforobtainingthescanselectivityestimatewhich*isreallyessential(butfortunately,useofcachingkeepsthecostof*gettingthatdowntosomethingreasonable).*Wealsoassumethatcost_sortischeapenoughtousehere.*这个函数和final_cost_mergejoin之间存在确切的划分,两者是没有相关的，*希望做到的是在初始估计的性能和得到严格的下限之间取得权衡。*在这里,不检查joinquals，除非获得非常重要的选择性估计值*(幸运的是，使用缓存可以将成本降至合理的水平)。*我们会假定cost_sort参数设置得足够的低.**'workspace'istobefilledwithstartup_cost,total_cost,andperhaps*otherdatatobeusedbyfinal_cost_mergejoin*workspace-返回值,用于函数final_cost_mergejoin*'jointype'isthetypeofjointobeperformed*jointype-连接类型*'mergeclauses'isthelistofjoinclausestobeusedasmergeclauses*mergeclauses-作为合并条件的连接条件链表*'outer_path'istheouterinputtothejoin*'inner_path'istheinnerinputtothejoin*'outersortkeys'isthelistofsortkeysfortheouterpath*'innersortkeys'isthelistofsortkeysfortheinnerpath*'extra'containsmiscellaneousinformationaboutthejoin**Note:outersortkeysandinnersortkeysshouldbeNILifnoexplicit*sortisneededbecausetherespectivesourcepathisalreadyordered.*/voidinitial_cost_mergejoin(PlannerInfo*root,JoinCostWorkspace*workspace,JoinTypejointype,List*mergeclauses,Path*outer_path,Path*inner_path,List*outersortkeys,List*innersortkeys,JoinPathExtraData*extra){Coststartup_cost=0;Costrun_cost=0;doubleouter_path_rows=outer_path->rows;doubleinner_path_rows=inner_path->rows;Costinner_run_cost;doubleouter_rows,inner_rows,outer_skip_rows,inner_skip_rows;Selectivityouterstartsel,outerendsel,innerstartsel,innerendsel;Pathsort_path;/*dummyforresultofcost_sort*//*确保行数大于0,并且不是NaN.Protectsomeassumptionsbelowthatrowcountsaren'tzeroorNaN*/if(outer_path_rows<=0||isnan(outer_path_rows))outer_path_rows=1;if(inner_path_rows<=0||isnan(inner_path_rows))inner_path_rows=1;/**Amergejoinwillstopassoonasitexhaustseitherinputstream*(unlessit'sanouterjoin,inwhichcasetheoutersidehastobe*scannedallthewayanyway).Estimatefractionoftheleftandright*inputsthatwillactuallyneedtobescanned.Likewise,wecan*estimatethenumberofrowsthatwillbeskippedbeforethefirstjoin*pairisfound,whichshouldbefactoredintostartupcost.Weuseonly*thefirst(mostsignificant)mergeclauseforthispurpose.Since*mergejoinscansel()isafairlyexpensivecomputation,wecachethe*resultsinthemergeclauseRestrictInfo.*mergejoin将在耗尽其中任意一个输入流后立即停止*(除非它是一个外部连接，在这种情况下无论如何都必须对外部进行扫描)。*估计需要扫描的左右输入的部分。*同样，可以估计在找到第一个连接对之前要跳过的行数，这应该考虑到启动成本。*为此，考虑到mergejoinscansel()是一个相当昂贵的计算，只使用第一个(最重要的)merge子句,*我们将结果缓存到merge子句的RestrictInfo中。*/if(mergeclauses&&jointype!=JOIN_FULL)//非全外连接{RestrictInfo*firstclause=(RestrictInfo*)linitial(mergeclauses);List*opathkeys;List*ipathkeys;PathKey*opathkey;PathKey*ipathkey;MergeScanSelCache*cache;/*Gettheinputpathkeystodeterminethesort-orderdetails*/opathkeys=outersortkeys?outersortkeys:outer_path->pathkeys;ipathkeys=innersortkeys?innersortkeys:inner_path->pathkeys;Assert(opathkeys);Assert(ipathkeys);opathkey=(PathKey*)linitial(opathkeys);ipathkey=(PathKey*)linitial(ipathkeys);/*debuggingcheck*/if(opathkey->pk_opfamily!=ipathkey->pk_opfamily||opathkey->pk_eclass->ec_collation!=ipathkey->pk_eclass->ec_collation||opathkey->pk_strategy!=ipathkey->pk_strategy||opathkey->pk_nulls_first!=ipathkey->pk_nulls_first)elog(ERROR,"leftandrightpathkeysdonotmatchinmergejoin");/*Gettheselectivitywithcaching*/cache=cached_scansel(root,firstclause,opathkey);if(bms_is_subset(firstclause->left_relids,outer_path->parent->relids)){/*子句左侧为outer.leftsideofclauseisouter*/outerstartsel=cache->leftstartsel;outerendsel=cache->leftendsel;innerstartsel=cache->rightstartsel;innerendsel=cache->rightendsel;}else{/*子句左侧为innerer.leftsideofclauseisinner*/outerstartsel=cache->rightstartsel;outerendsel=cache->rightendsel;innerstartsel=cache->leftstartsel;innerendsel=cache->leftendsel;}if(jointype==JOIN_LEFT||jointype==JOIN_ANTI){outerstartsel=0.0;outerendsel=1.0;}elseif(jointype==JOIN_RIGHT){innerstartsel=0.0;innerendsel=1.0;}}else{/*无条件或者全外连接.copewithclauselessorfullmergejoin*/outerstartsel=innerstartsel=0.0;outerendsel=innerendsel=1.0;}/**Convertselectivitiestorowcounts.Weforceouter_rowsand*inner_rowstobeatleast1,buttheskip_rowsestimatescanbezero.*转换选择率为行数.*outer_rows和inner_rows至少为1,但skip_rows可以为0.*/outer_skip_rows=rint(outer_path_rows*outerstartsel);inner_skip_rows=rint(inner_path_rows*innerstartsel);outer_rows=clamp_row_est(outer_path_rows*outerendsel);inner_rows=clamp_row_est(inner_path_rows*innerendsel);Assert(outer_skip_rows<=outer_rows);Assert(inner_skip_rows<=inner_rows);/**Readjustscanselectivitiestoaccountforaboverounding.Thisis*normallyaninsignificanteffect,butwhenthereareonlyafewrowsin*theinputs,failingtodothismakesforalargepercentageerror.*重新调整扫描选择性以考虑四舍五入。*通常来说,这是一个无关紧要的事情，但是当输入中只有几行时，如果不这样做，就会造成很大的误差。*/outerstartsel=outer_skip_rows/outer_path_rows;innerstartsel=inner_skip_rows/inner_path_rows;outerendsel=outer_rows/outer_path_rows;innerendsel=inner_rows/inner_path_rows;Assert(outerstartsel<=outerendsel);Assert(innerstartsel<=innerendsel);/*costofsourcedata*/if(outersortkeys)/*doweneedtosortouter?*/{cost_sort(&sort_path,root,outersortkeys,outer_path->total_cost,outer_path_rows,outer_path->pathtarget->width,0.0,work_mem,-1.0);//计算outerrelation的排序成本startup_cost+=sort_path.startup_cost;startup_cost+=(sort_path.total_cost-sort_path.startup_cost)*outerstartsel;run_cost+=(sort_path.total_cost-sort_path.startup_cost)*(outerendsel-outerstartsel);}else{startup_cost+=outer_path->startup_cost;startup_cost+=(outer_path->total_cost-outer_path->startup_cost)*outerstartsel;run_cost+=(outer_path->total_cost-outer_path->startup_cost)*(outerendsel-outerstartsel);}if(innersortkeys)/*doweneedtosortinner?*/{cost_sort(&sort_path,root,innersortkeys,inner_path->total_cost,inner_path_rows,inner_path->pathtarget->width,0.0,work_mem,-1.0);//计算innerrelation的排序成本startup_cost+=sort_path.startup_cost;startup_cost+=(sort_path.total_cost-sort_path.startup_cost)*innerstartsel;inner_run_cost=(sort_path.total_cost-sort_path.startup_cost)*(innerendsel-innerstartsel);}else{startup_cost+=inner_path->startup_cost;startup_cost+=(inner_path->total_cost-inner_path->startup_cost)*innerstartsel;inner_run_cost=(inner_path->total_cost-inner_path->startup_cost)*(innerendsel-innerstartsel);}/**Wecan'tyetdeterminewhetherrescanningoccurs,orwhether*materializationoftheinnerinputshouldbedone.Theminimum*possibleinnerinputcost,regardlessofrescanandmaterialization*considerations,isinner_run_cost.Weincludethatin*workspace->total_cost,butnotyetinrun_cost.*现在还不能确定是否会重新扫描，或者是否应该物化innerrelation。*不管是否重新扫描和是否尝试物化，最小的innerrelationinput成本是inner_run_cost。*将其包含在workspace—>total_cost中，但不包含在run_cost中。*//*CPUcostsleftforlater*//*Publicresultfields*/workspace->startup_cost=startup_cost;workspace->total_cost=startup_cost+run_cost+inner_run_cost;/*Saveprivatedataforfinal_cost_mergejoin*/workspace->run_cost=run_cost;workspace->inner_run_cost=inner_run_cost;workspace->outer_rows=outer_rows;workspace->inner_rows=inner_rows;workspace->outer_skip_rows=outer_skip_rows;workspace->inner_skip_rows=inner_skip_rows;}

try_mergejoin_path->add_path_precheck
该函数检查新路径是否可能被接受.

//-----------------------try_mergejoin_path->add_path_precheck/**add_path_precheck*Checkwhetheraproposednewpathcouldpossiblygetaccepted.*Weassumeweknowthepath'spathkeysandparameterizationaccurately,*andhavelowerboundsforitscosts.*检查新路径是否可能被接受。假设已经准确知道路径的排序键和参数化，并且已知其成本的下限。**Notethatwedonotknowthepath'srowcount,sincegettinganestimatefor*thatistooexpensivetodobeforeprechecking.Weassumeherethatpaths*ofasupersetparameterizationwillgeneratefewerrows;ifthatholds,*thenpathswithdifferentparameterizationscannotdominateeachother*andsowecansimplyignoreexistingpathsofanotherparameterization.*(Intheinfrequentcaseswherethatruleofthumbfails,add_pathwill*getridoftheinferiorpath.)*注意:我们不知道访问路径的行数，因为在进行预检查之前获取它的估计值是非常昂贵的。*只能假设超集参数化的路径会产生更少的行;*如果该假设成立，那么具有不同参数化的路径不能互相控制，因此可以简单地忽略另一个参数化的现有路径。*(在这种经验法则失效的情况下-虽然这种情况很少发生-add_path函数将删除相对较差的路径。)**Atthetimethisiscalled,wehaven'tactuallybuiltaPathstructure,*sotherequiredinformationhastobepassedpiecemeal.*在调用这个函数时，实际上还没有构建一个Path结构体，因此需要的信息必须逐段传递。*/booladd_path_precheck(RelOptInfo*parent_rel,Coststartup_cost,Costtotal_cost,List*pathkeys,Relidsrequired_outer){List*new_path_pathkeys;boolconsider_startup;ListCell*p1;/*Pretendparameterizedpathshavenopathkeys,peradd_pathpolicy*/new_path_pathkeys=required_outer?NIL:pathkeys;/*Decidewhethernewpath'sstartupcostisinteresting*/consider_startup=required_outer?parent_rel->consider_param_startup:parent_rel->consider_startup;foreach(p1,parent_rel->pathlist){Path*old_path=(Path*)lfirst(p1);PathKeysComparisonkeyscmp;/**Wearelookingforanold_pathwiththesameparameterization(and*byassumptionthesamerowcount)thatdominatesthenewpathon*pathkeysaswellasbothcostmetrics.Ifwefindone,wecan*rejectthenewpath.*寻找一个old_path，它具有相同的参数化(并假设有相同的行数)，*在PathKey上主导新的路径以及两个成本指标。如果找到一个，就可以丢弃新的路径。**Costcomparisonshereshouldmatchcompare_path_costs_fuzzily.*这里的成本比较应该与compare_path_costs_fuzzily匹配。*/if(total_cost>old_path->total_cost*STD_FUZZ_FACTOR){/*newpathcanwinonstartupcostonlyifconsider_startup*/if(startup_cost>old_path->startup_cost*STD_FUZZ_FACTOR||!consider_startup){/*新路径成本较高,检查排序键.newpathlosesoncost,socheckpathkeys...*/List*old_path_pathkeys;old_path_pathkeys=old_path->param_info?NIL:old_path->pathkeys;keyscmp=compare_pathkeys(new_path_pathkeys,old_path_pathkeys);if(keyscmp==PATHKEYS_EQUAL||keyscmp==PATHKEYS_BETTER2){/*排序键也不占优,丢弃之.newpathdoesnotwinonpathkeys...*/if(bms_equal(required_outer,PATH_REQ_OUTER(old_path))){/*Foundanoldpaththatdominatesthenewone*/returnfalse;}}}}else{/**Sincethepathlistissortedbytotal_cost,wecanstoplooking*oncewereachapathwithatotal_costlargerthanthenew*path's.*由于路径链表是按total_cost排序的，所以当发现一个total_cost大于新路径成本的路径时，停止查找。*/break;}}returntrue;}

create_mergejoin_path->final_cost_mergejoin
该函数确定mergejoin path最终的成本和大小.

//--------------create_mergejoin_path->final_cost_mergejoin/**final_cost_mergejoin*Finalestimateofthecostandresultsizeofamergejoinpath.*确定mergejoinpath最终的成本和大小**Unlikeothercostsizefunctions,thisroutinemakestwoactualdecisions:*whethertheexecutorwillneedtodomark/restore,andwhetherweshould*materializetheinnerpath.Itwouldbelogicallycleanertobuild*separatepathstestingthesealternatives,butthatwouldrequirerepeating*mostofthecostcalculations,whicharenotallthatcheap.Sincethe*choicewillnotaffectoutputpathkeysorstartupcost,onlytotalcost,*thereisnopossibilityofwantingtokeepmorethanonepath.Soitseems*besttomakethedecisionshereandrecordtheminthepath's*skip_mark_restoreandmaterialize_innerfields.*与其他costsize函数不同，该函数做出了两个实际决策:执行器是否需要执行mark/restore，以及是否应该物化内部访问路径。*在逻辑上，构建单独的路径来测试这些替代方案会更简洁，但这需要重复大部分影响性能的成本计算过程。*由于选择率不会影响输出PathKey或启动成本，只会影响总成本，因此不可能保留多个路径。*因此，最好在这里做出决定，并将其记录在path的skip_mark_restore和materialize_inner字段中。**Mark/restoreoverheadisusuallyrequired,butcanbeskippedifweknow*thattheexecutorneedfindonlyonematchperoutertuple,andthatthe*mergeclausesaresufficienttoidentifyamatch.*通常需要mark/restore的开销，但是如果知道执行器只需要为每个外部元组找到一个匹配，*并且mergeclauses足以识别匹配，那么可以忽略这个开销。**Wematerializetheinnerpathifweneedmark/restoreandeithertheinner*pathcan'tsupportmark/restore,orit'scheapertouseaninterposed*Materialnodetohandlemark/restore.*如果需要mark/restore，或者内部路径不支持mark/restore，*或者使用插入的物化节点来处理mark/restore，优化器就会物化innerrelation的访问路径。**'path'isalreadyfilledinexceptfortherowsandcostfieldsand*skip_mark_restoreandmaterialize_inner*path-MergePath节点*'workspace'istheresultfrominitial_cost_mergejoin*workspace-initial_cost_mergejoin函数返回的结果*'extra'containsmiscellaneousinformationaboutthejoin*extra-额外的信息*/voidfinal_cost_mergejoin(PlannerInfo*root,MergePath*path,JoinCostWorkspace*workspace,JoinPathExtraData*extra){Path*outer_path=path->jpath.outerjoinpath;Path*inner_path=path->jpath.innerjoinpath;doubleinner_path_rows=inner_path->rows;List*mergeclauses=path->path_mergeclauses;List*innersortkeys=path->innersortkeys;Coststartup_cost=workspace->startup_cost;Costrun_cost=workspace->run_cost;Costinner_run_cost=workspace->inner_run_cost;doubleouter_rows=workspace->outer_rows;doubleinner_rows=workspace->inner_rows;doubleouter_skip_rows=workspace->outer_skip_rows;doubleinner_skip_rows=workspace->inner_skip_rows;Costcpu_per_tuple,bare_inner_cost,mat_inner_cost;QualCostmerge_qual_cost;QualCostqp_qual_cost;doublemergejointuples,rescannedtuples;doublerescanratio;/*确保行数合法.Protectsomeassumptionsbelowthatrowcountsaren'tzeroorNaN*/if(inner_path_rows<=0||isnan(inner_path_rows))inner_path_rows=1;/*标记访问路径的行数.Markthepathwiththecorrectrowestimate*/if(path->jpath.path.param_info)path->jpath.path.rows=path->jpath.path.param_info->ppi_rows;elsepath->jpath.path.rows=path->jpath.path.parent->rows;/*并行执行,调整行数.Forpartialpaths,scalerowestimate.*/if(path->jpath.path.parallel_workers>0){doubleparallel_divisor=get_parallel_divisor(&path->jpath.path);path->jpath.path.rows=clamp_row_est(path->jpath.path.rows/parallel_divisor);}/**Wecouldincludedisable_costinthepreliminaryestimate,butthat*wouldamounttooptimizingforthecasewherethejoinmethodis*disabled,whichdoesn'tseemlikethewaytobet.*不允许mergejoin则设置为高成本*/if(!enable_mergejoin)startup_cost+=disable_cost;/**Computecostofthemergequalsandqpquals(otherrestrictionclauses)*separately.*分别计算mergequals和qpquals的成本*/cost_qual_eval(&merge_qual_cost,mergeclauses,root);cost_qual_eval(&qp_qual_cost,path->jpath.joinrestrictinfo,root);qp_qual_cost.startup-=merge_qual_cost.startup;qp_qual_cost.per_tuple-=merge_qual_cost.per_tuple;/**WithaSEMIorANTIjoin,oriftheinnerrelisknownunique,the*executorwillstopscanningformatchesafterthefirstmatch.When*allthejoinclausesaremergeclauses,thismeanswedon'teverneedto*backupthemerge,andsowecanskipmark/restoreoverhead.*使用半连接或反连接，或者innerrelation返回的结果唯一，*执行程序将在第一次匹配后停止扫描匹配。*当所有的joinclauses都是merge子句时，意味着不需要备份merge，这样就可以跳过mark/restore步骤。*/if((path->jpath.jointype==JOIN_SEMI||path->jpath.jointype==JOIN_ANTI||extra->inner_unique)&&(list_length(path->jpath.joinrestrictinfo)==list_length(path->path_mergeclauses)))path->skip_mark_restore=true;elsepath->skip_mark_restore=false;/**Getapprox#tuplespassingthemergequals.Weuseapprox_tuple_count*herebecauseweneedanestimatedonewithJOIN_INNERsemantics.*通过mergequals条件获得大约的元组数。*这里使用approx_tuple_count，因为需要使用JOIN_INNER语义进行评估。*/mergejointuples=approx_tuple_count(root,&path->jpath,mergeclauses);/**Whenthereareequalmergekeysintheouterrelation,themergejoin*mustrescananymatchingtuplesintheinnerrelation.Thismeans*re-fetchinginnertuples;wehavetoestimatehowoftenthathappens.*当外部关系中有相等的merge键时，mergejoin必须重新扫描内部关系中的所有匹配元组。*这意味着需要重新获取内部元组;必须估计这种情况发生的频率。**Forregularinnerandouterjoins,thenumberofre-fetchescanbe*estimatedapproximatelyassizeofmergejoinoutputminussizeof*innerrelation.Assumethatthedistinctkeyvaluesare1,2,...,and*denotethenumberofvaluesofeachkeyintheouterrelationasm1,*m2,...;intheinnerrelation,n1,n2,...Thenwehave**sizeofjoin=m1*n1+m2*n2+...**numberofrescannedtuples=(m1-1)*n1+(m2-1)*n2+...=m1**n1+m2*n2+...-(n1+n2+...)=sizeofjoin-sizeofinner*relation**对于常规的内连接和外连接，重新获取的次数可以近似估计为合并连接输出的大小减去内部关系的大小。*假设唯一键值是1,2…，表示外部关系中每个键的值个数为m1,m2，…;在内部关系中，n1,n2，…*那么会得到:*连接的大小=m1*n1+m2*n2+...*重新扫描的元组数=(m1-1)*n1+(m2-1)*n2+...=m1*n1+m2*n2+...-(n1+n2+...)*=sizeofjoin-sizeofinnerrelation**Thisequationworkscorrectlyforoutertupleshavingnoinnermatch*(nk=0),butnotforinnertupleshavingnooutermatch(mk=0);we*areeffectivelysubtractingthosefromthenumberofrescannedtuples,*whenweshouldnot.Canwedobetterwithoutexpensiveselectivity*computations?*对于没有内部匹配(nk=0)的外部元组，这个方程是正确的，*但是对于没有外部匹配(mk=0)的内部元组，这个方程就不正确了;*不这样做时，实际上是在从重新扫描元组的数量中减去这些元组。*如果没有昂贵的选择性计算，能做得更好吗?**Thewholeissueismootifweareworkingfromaunique-ifiedouter*input,orifweknowwedon'tneedtomark/restoreatall.*如果使用的是唯一的外部输入，或者知道根本不需要mark/restore，那么探讨该问题是没有意义的。*/if(IsA(outer_path,UniquePath)||path->skip_mark_restore)rescannedtuples=0;else{rescannedtuples=mergejointuples-inner_path_rows;/*Mustclampbecauseofpossibleunderestimate*/if(rescannedtuples<0)rescannedtuples=0;}/*对于重复扫描,需要额外增加成本.We'llinflatevariouscoststhismuchtoaccountforrescanning*/rescanratio=1.0+(rescannedtuples/inner_path_rows);/**Decidewhetherwewanttomaterializetheinnerinputtoshielditfrom*mark/restoreandperformingre-fetches.Ourcostmodelforregular*re-fetchesisthatare-fetchcoststhesameasanoriginalfetch,*whichisprobablyanoverestimate;butontheotherhandweignorethe*bookkeepingcostsofmark/restore.Notclearifit'sworthdeveloping*amorerefinedmodel.Sowejustneedtoinflatetheinnerruncostby*rescanratio.*决定是否要物化innerrelation的访问路径，以使其不受mark/restore和执行重新获取操作的影响。*对于定期重取的成本模型是，一次重取的成本与一次原始取回的成本相同，当然这可能高估了该成本;*但另一方面，忽略了mark/restore的簿记成本。*目前尚不清楚是否值得开发一个更完善的模型。所以只需要把内部运行成本乘上相应的比例。*/bare_inner_cost=inner_run_cost*rescanratio;/**WhenweinterposeaMaterialnodethere-fetchcostisassumedtobe*justcpu_operator_costpertuple,independentlyoftheunderlying*plan'scost;andwechargeanextracpu_operator_costperoriginal*fetchaswell.Notethatwe'reassumingthematerializenodewill*neverspilltodisk,sinceitonlyhastoremembertuplesbacktothe*lastmark.(Ifthereareahugenumberofduplicates,ourothercost*factorswillmakethepathsoexpensivethatitprobablywon'tget*chosenanyway.)Sowedon'tusecost_rescanhere.*当插入一个物化节点时，重新取回的成本被假设为cpu_operator_cost/每个元组，该成本独立于底层计划的成本;*对每次原始取回增加额外的cpu_operator_cost。*注意，我们假设物化节点永远不会溢出到磁盘上，因为它只需要记住元组的最后一个标记。*(如果有大量的重复项，其他成本因素会使路径变得非常昂贵，可能无论如何都不会被选中。)*因此在这里,我们不调用cost_rescan。**Note:keepthisestimateinsyncwithcreate_mergejoin_plan'slabeling*ofthegeneratedMaterialnode.*注意:与create_mergejoin_plan函数生成的物化节点标记保持一致。*/mat_inner_cost=inner_run_cost+cpu_operator_cost*inner_path_rows*rescanratio;/**Ifwedon'tneedmark/restoreatall,wedon'tneedmaterialization.*如果不需要mark/restore,那么也不需要物化*/if(path->skip_mark_restore)path->materialize_inner=false;/**Prefermaterializingifitlookscheaper,unlesstheuserhasaskedto*suppressmaterialization.*如果物化看起来成本更低，那么就选择物化，前提是用户允许物化。*/elseif(enable_material&&mat_inner_cost<bare_inner_cost)path->materialize_inner=true;/**Evenifmaterializingdoesn'tlookcheaper,we*must*doitifthe*innerpathistobeuseddirectly(withoutsorting)anditdoesn't*supportmark/restore.*即使物化看起来成本不低，*但如果innerrelation的访问路径是直接使用(没有排序)并且不支持mark/restore，我们也必须这样做。**Sincetheinnersidemustbeordered,andonlySortsandIndexScanscan*createordertobeginwith,andtheybothsupportmark/restore,you*mightthinkthere'snoproblem---butyou'dbewrong.Nestloopand*mergejoinscan*preserve*theorderoftheirinputs,sotheycanbe*selectedastheinputofamergejoin,andtheydon'tsupport*mark/restoreatpresent.*因为inner端必须是有序的，并且只有Sorts和IndexScan才能从一开始就创建排序，*而且它们都支持mark/restore，所以可能认为这样处理没有问题——但是错了。*Nestloopjoin和mergejoin可以“保留”它们的输入顺序，*因此它们可以被选择为mergejoin的输入，而且它们目前不支持mark/restore。**Wedon'ttestthevalueofenable_materialhere,because*materializationisrequiredforcorrectnessinthiscase,andturning*itoffdoesnotentitleustodeliveraninvalidplan.*在这里不会理会enable_material的设置，因为在这种情况下，优先保证正确性，*而关闭它会让优化器产生无效的执行计划。*/elseif(innersortkeys==NIL&&!ExecSupportsMarkRestore(inner_path))path->materialize_inner=true;/**Also,forcematerializingiftheinnerpathistobesortedandthe*sortisexpectedtospilltodisk.Thisisbecausethefinalmerge*passcanbedoneon-the-flyifitdoesn'thavetosupportmark/restore.*Wedon'ttrytoadjustthecostestimatesforthisconsideration,*though.*另外，如果要对innerrelation访问路径进行排序，且排序可能会溢出到磁盘，则强制实现。*这是因为如果不需要支持mark/restore，最终的merge过程可以在运行中完成。*不过，我们并没有试图为此调整成本估算。**Sincematerializationisaperformanceoptimizationinthiscase,*ratherthannecessaryforcorrectness,weskipitifenable_materialis*off.*由于在这种情况下，物化处理是一种性能优化，而不是保证正确的必要条件，*所以如果enable_material关闭了，那么就忽略它。*/elseif(enable_material&&innersortkeys!=NIL&&relation_byte_size(inner_path_rows,inner_path->pathtarget->width)>(work_mem*1024L))path->materialize_inner=true;elsepath->materialize_inner=false;/*调整运行期成本.Chargetherightincrementalcostforthechosencase*/if(path->materialize_inner)run_cost+=mat_inner_cost;elserun_cost+=bare_inner_cost;/*CPU成本计算.CPUcosts*//**Thenumberoftuplecomparisonsneededisapproximatelynumberofouter*rowsplusnumberofinnerrowsplusnumberofrescannedtuples(canwe*refinethis?).Ateachone,weneedtoevaluatethemergejoinquals.*需要比较的元组数量大约是外部行数加上内部行数加上重扫描元组数(可以改进它吗?)*在每一个点上，需要计算合并后的数量。*/startup_cost+=merge_qual_cost.startup;startup_cost+=merge_qual_cost.per_tuple*(outer_skip_rows+inner_skip_rows*rescanratio);run_cost+=merge_qual_cost.per_tuple*((outer_rows-outer_skip_rows)+(inner_rows-inner_skip_rows)*rescanratio);/**Foreachtuplethatgetsthroughthemergejoinproper,wecharge*cpu_tuple_costplusthecostofevaluatingadditionalrestriction*clausesthataretobeappliedatthejoin.(Thisispessimisticsince*notallofthequalsmaygetevaluatedateachtuple.)*对于每个通过合并连接得到的元组，每一个元组的成本是cpu_tuple_cost，加上将在连接上应用的附加限制条件的成本。*(当然这是悲观的做法，因为并不是所有的条件都能在每个元组上得到应用。)**Note:wecouldadjustforSEMI/ANTIjoinsskippingsomequal*evaluationshere,butit'sprobablynotworththetrouble.*注意:我们可以对半/反连接进行调整，跳过一些条件评估，但这可能并不值得。*/startup_cost+=qp_qual_cost.startup;cpu_per_tuple=cpu_tuple_cost+qp_qual_cost.per_tuple;run_cost+=cpu_per_tuple*mergejointuples;/*投影列的估算按输出行计算,而不是扫描的元组.tlistevalcostsarepaidperoutputrow,notpertuplescanned*/startup_cost+=path->jpath.path.pathtarget->cost.startup;run_cost+=path->jpath.path.pathtarget->cost.per_tuple*path->jpath.path.rows;path->jpath.path.startup_cost=startup_cost;path->jpath.path.total_cost=startup_cost+run_cost;}三、跟踪分析

测试脚本如下

selecta.*,b.grbh,b.jefromt_dwxxa,lateral(selectt1.dwbh,t1.grbh,t2.jefromt_grxxt1innerjoint_jfxxt2ont1.dwbh=a.dwbhandt1.grbh=t2.grbh)borderbyb.dwbh;

启动gdb,设置断点

(gdb)btry_mergejoin_pathBreakpoint1at0x7aeeaf:filejoinpath.c,line572.(gdb)cContinuing.Breakpoint1,try_mergejoin_path(root=0x166c880,joinrel=0x16864d0,outer_path=0x167f190,inner_path=0x167f9d0,pathkeys=0x1innersortkeys=0x1686c28,jointype=JOIN_INNER,extra=0x7ffea604f500,is_partial=false)atjoinpath.c:572572if(is_partial)

进入initial_cost_mergejoin函数

(gdb)615initial_cost_mergejoin(root,&workspace,jointype,mergeclauses,(gdb)stepinitial_cost_mergejoin(root=0x166c880,workspace=0x7ffea604f360,jointype=JOIN_INNER,mergeclauses=0x1686bc8,outer_path=0x167f190,inner_path=0x167f9d0,outersortkeys=0x1686b68,innersortkeys=0x1686c28,extra=0x7ffea604f500)atcostsize.c:26072607Coststartup_cost=0;

初始化参数

2607Coststartup_cost=0;(gdb)n2608Costrun_cost=0;...

存在merge条件,JOIN_INNER连接,进入相应的分支

...2639if(mergeclauses&&jointype!=JOIN_FULL)(gdb)2641RestrictInfo*firstclause=(RestrictInfo*)linitial(mergeclauses);...

查看约束条件,即t_dwxx.dwbh = t_grxx.dwbh

(gdb)set$arg1=(RelabelType*)((OpExpr*)firstclause->clause)->args->head->data.ptr_value(gdb)set$arg2=(RelabelType*)((OpExpr*)firstclause->clause)->args->head->next->data.ptr_value(gdb)p*(Var*)$arg1->arg$9={xpr={type=T_Var},varno=1,varattno=2,vartype=1043,vartypmod=24,varcollid=100,varlevelsup=0,varnoold=1,varoattno=2,location=218}(gdb)p*(Var*)$arg2->arg$10={xpr={type=T_Var},varno=3,varattno=1,vartype=1043,vartypmod=14,varcollid=100,varlevelsup=0,varnoold=3,varoattno=1,location=208}

获取连接的outer和inner relation的排序键

...(gdb)2649opathkeys=outersortkeys?outersortkeys:outer_path->pathkeys;(gdb)n2650ipathkeys=innersortkeys?innersortkeys:inner_path->pathkeys;...

获取缓存的选择率

...(gdb)2663cache=cached_scansel(root,firstclause,opathkey);(gdb)2666outer_path->parent->relids))(gdb)p*cache$15={opfamily=1994,collation=100,strategy=1,nulls_first=false,leftstartsel=0,leftendsel=0.99989010989010996,rightstartsel=0.0091075159436798652,rightendsel=1}

选择率赋值,连接子句左侧为outer relation

2665if(bms_is_subset(firstclause->left_relids,(gdb)2669outerstartsel=cache->leftstartsel;(gdb)2670outerendsel=cache->leftendsel;(gdb)2671innerstartsel=cache->rightstartsel;(gdb)2672innerendsel=cache->rightendsel;

把选择率转换为行数

(gdb)2705outer_skip_rows=rint(outer_path_rows*outerstartsel);(gdb)2706inner_skip_rows=rint(inner_path_rows*innerstartsel);(gdb)2707outer_rows=clamp_row_est(outer_path_rows*outerendsel);(gdb)2708inner_rows=clamp_row_est(inner_path_rows*innerendsel);(gdb)pouter_skip_rows$16=0(gdb)pinner_skip_rows$17=911(gdb)pouter_rows$18=9999(gdb)pinner_rows$19=100000

计算outer relation的排序成本并赋值

...(gdb)2728if(outersortkeys)/*doweneedtosortouter?*/(gdb)2730cost_sort(&sort_path,(gdb)n2735outer_path->pathtarget->width,...(gdb)n2739startup_cost+=sort_path.startup_cost;(gdb)psort_path$24={type=T_Invalid,pathtype=T_Invalid,parent=0x167f9d0,pathtarget=0x0,param_info=0x0,parallel_aware=false,parallel_safe=false,parallel_workers=0,rows=10000,startup_cost=828.38561897747286,total_cost=853.38561897747286,pathkeys=0x167f9d0}

计算inner relation的排序成本并赋值

...2754if(innersortkeys)/*doweneedtosortinner?*/(gdb)2756cost_sort(&sort_path,...(gdb)psort_path$25={type=T_Invalid,pathtype=T_Invalid,parent=0x167f9d0,pathtarget=0x0,param_info=0x0,parallel_aware=false,parallel_safe=false,parallel_workers=0,rows=100000,startup_cost=10030.82023721841,total_cost=10280.82023721841,pathkeys=0x167f9d0}

赋值给workspace,结束initial_cost_mergejoin函数调用.

(gdb)2791workspace->startup_cost=startup_cost;(gdb)2792workspace->total_cost=startup_cost+run_cost+inner_run_cost;(gdb)2794workspace->run_cost=run_cost;(gdb)2795workspace->inner_run_cost=inner_run_cost;(gdb)2796workspace->outer_rows=outer_rows;(gdb)2797workspace->inner_rows=inner_rows;(gdb)2798workspace->outer_skip_rows=outer_skip_rows;(gdb)2799workspace->inner_skip_rows=inner_skip_rows;(gdb)2800}

进入add_path_precheck函数

(gdb)ntry_mergejoin_path(root=0x166c880,joinrel=0x16864d0,outer_path=0x167f190,inner_path=0x167f9d0,pathkeys=0x1686b68,mergeclauses=0x1686bc8,outersortkeys=0x1686b68,innersortkeys=0x1686c28,jointype=JOIN_INNER,extra=0x7ffea604f500,is_partial=false)atjoinpath.c:620620if(add_path_precheck(joinrel,(gdb)stepadd_path_precheck(parent_rel=0x16864d0,startup_cost=10861.483356195882,total_cost=11134.203356195881,pathkeys=0x1686b68,required_outer=0x0)atpathnode.c:666666new_path_pathkeys=required_outer?NIL:pathkeys;

parent_rel->pathlist为NULL,结束调用,返回true

671foreach(p1,parent_rel->pathlist)(gdb)p*parent_rel->pathlistCannotaccessmemoryataddress0x0(gdb)n719returntrue;

进入final_cost_mergejoin函数

(gdb)bfinal_cost_mergejoinBreakpoint2at0x7a00c9:filecostsize.c,line2834.(gdb)cContinuing.Breakpoint2,final_cost_mergejoin(root=0x166c880,path=0x1686cb8,workspace=0x7ffea604f360,extra=0x7ffea604f500)atcostsize.c:28342834Path*outer_path=path->jpath.outerjoinpath;

查看outer relation和inner relation的访问路径

2834Path*outer_path=path->jpath.outerjoinpath;(gdb)n2835Path*inner_path=path->jpath.innerjoinpath;(gdb)2836doubleinner_path_rows=inner_path->rows;(gdb)p*outer_path$26={type=T_Path,pathtype=T_SeqScan,parent=0x166c2c0,pathtarget=0x1671670,param_info=0x0,parallel_aware=false,parallel_safe=true,parallel_workers=0,rows=10000,startup_cost=0,total_cost=164,pathkeys=0x0}(gdb)p*inner_path$27={type=T_Path,pathtype=T_SeqScan,parent=0x166c4d8,pathtarget=0x1673510,param_info=0x0,parallel_aware=false,parallel_safe=true,parallel_workers=0,rows=100000,startup_cost=0,total_cost=1726,pathkeys=0x0}

其他参数的赋值

(gdb)n2837List*mergeclauses=path->path_mergeclauses;(gdb)2838List*innersortkeys=path->innersortkeys;...

分别计算mergequals和qpquals的成本

...(gdb)2886cost_qual_eval(&merge_qual_cost,mergeclauses,root);(gdb)2887cost_qual_eval(&qp_qual_cost,path->jpath.joinrestrictinfo,root);(gdb)2888qp_qual_cost.startup-=merge_qual_cost.startup;(gdb)pmerge_qual_cost$28={startup=0,per_tuple=0.0025000000000000001}(gdb)pqp_qual_cost$29={startup=0,per_tuple=0.0025000000000000001}

通过mergequals条件获得大约的元组数

...(gdb)2910mergejointuples=approx_tuple_count(root,&path->jpath,mergeclauses);(gdb)2938if(IsA(outer_path,UniquePath)||path->skip_mark_restore)(gdb)pmergejointuples$30=100000

计算相同元组导致的重复扫描率

(gdb)n2942rescannedtuples=mergejointuples-inner_path_rows;(gdb)2944if(rescannedtuples<0)(gdb)2948rescanratio=1.0+(rescannedtuples/inner_path_rows);(gdb)prescanratio$31=1(gdb)prescannedtuples$32=0

物化处理,结果是inner relation扫描无需物化:path->materialize_inner = false;

(gdb)2974mat_inner_cost=inner_run_cost+(gdb)n2980if(path->skip_mark_restore)(gdb)2987elseif(enable_material&&mat_inner_cost<bare_inner_cost)(gdb)3006elseif(innersortkeys==NIL&&(gdb)3021elseif(enable_material&&innersortkeys!=NIL&&(gdb)3023inner_path->pathtarget->width)>(gdb)pmat_inner_cost$34=497.72250000000003...3021elseif(enable_material&&innersortkeys!=NIL&&(gdb)3027path->materialize_inner=false;

计算成本

(gdb)n3043startup_cost+=merge_qual_cost.per_tuple*(gdb)3044(outer_skip_rows+inner_skip_rows*rescanratio);(gdb)3043startup_cost+=merge_qual_cost.per_tuple*(gdb)3045run_cost+=merge_qual_cost.per_tuple*(gdb)3046((outer_rows-outer_skip_rows)+(gdb)3047(inner_rows-inner_skip_rows)*rescanratio);(gdb)3046((outer_rows-outer_skip_rows)+(gdb)3045run_cost+=merge_qual_cost.per_tuple*(gdb)3058startup_cost+=qp_qual_cost.startup;(gdb)3059cpu_per_tuple=cpu_tuple_cost+qp_qual_cost.per_tuple;(gdb)3060run_cost+=cpu_per_tuple*mergejointuples;(gdb)3063startup_cost+=path->jpath.path.pathtarget->cost.startup;(gdb)3064run_cost+=path->jpath.path.pathtarget->cost.per_tuple*path->jpath.path.rows;(gdb)3066path->jpath.path.startup_cost=startup_cost;(gdb)3067path->jpath.path.total_cost=startup_cost+run_cost;(gdb)3068}

完成调用

(gdb)create_mergejoin_path(root=0x166c880,joinrel=0x16864d0,jointype=JOIN_INNER,workspace=0x7ffea604f360,extra=0x7ffea604f500,outer_path=0x167f190,inner_path=0x167f9d0,restrict_clauses=0x16869a8,pathkeys=0x1686b68,required_outer=0x0,mergeclauses=0x1686bc8,outersortkeys=0x1686b68,innersortkeys=0x1686c28)atpathnode.c:22982298returnpathnode;

MergePath节点信息

2298returnpathnode;(gdb)p*pathnode$36={jpath={path={type=T_MergePath,pathtype=T_MergeJoin,parent=0x16864d0,pathtarget=0x1686708,param_info=0x0,parallel_aware=false,parallel_safe=true,parallel_workers=0,rows=100000,startup_cost=10863.760856195882,total_cost=12409.200856195883,pathkeys=0x1686b68},jointype=JOIN_INNER,inner_unique=false,outerjoinpath=0x167f190,innerjoinpath=0x167f9d0,joinrestrictinfo=0x16869a8},path_mergeclauses=0x1686bc8,outersortkeys=0x1686b68,innersortkeys=0x1686c28,skip_mark_restore=false,materialize_inner=false}

到此，相信大家对“PostgreSQL中用于计算merge join的Cost函数有哪些”有了更深的了解，不妨来实际操作一番吧！这里是亿速云网站，更多相关内容可以进入相关频道进行查询，关注我们，继续学习！