分析PostgreSQL中的大表连接
这篇文章主要介绍“分析PostgreSQL中的大表连接”,在日常操作中,相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”分析PostgreSQL中的大表连接”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
数据库配置
主机CPU 4核,内存4G,PG共享缓存128MB,work_mem 4MB。
测试数据
创建4张表,每张表1000w行,数据量约1G,是PG共享内存的8倍。
droptablet_big_1;droptablet_big_2;droptablet_big_3;droptablet_big_4;createtablet_big_1(idint,c1varchar(30),c2varchar(30),c3varchar(30));createtablet_big_2(idint,c1varchar(30),c2varchar(30),c3varchar(30));createtablet_big_3(idint,c1varchar(30),c2varchar(30),c3varchar(30));createtablet_big_4(idint,c1varchar(30),c2varchar(30),c3varchar(30));insertintot_big_1selectx,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3')fromgenerate_series(1,10000000)asx;insertintot_big_2selectx,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3')fromgenerate_series(1,10000000)asx;insertintot_big_3selectx,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3')fromgenerate_series(1,10000000)asx;insertintot_big_4selectx,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3')fromgenerate_series(1,10000000)asx;showshared_buffers;showeffective_cache_size;showwork_mem;selectpg_size_pretty(pg_table_size('t_big_1'));selectpg_size_pretty(pg_table_size('t_big_2'));selectpg_size_pretty(pg_table_size('t_big_3'));selectpg_size_pretty(pg_table_size('t_big_4'));analyzet_big_1,t_big_2,t_big_3,t_big_4;explainverboseselecta.*fromt_big_1ajoint_big_2bona.c1=b.c1;explainverboseselecta.id,b.c1,c.c2,d.c3fromt_big_1a,t_big_2b,t_big_3c,t_big_4dwherea.id=b.idandb.id=c.idandc.id=d.id;explainverboseselecta.id,b.c1,c.c2,d.c3fromt_big_1a,t_big_2b,t_big_3c,t_big_4dwherea.id=b.idandb.c1=c.c1andc.c2=d.c2;
大表连接
未分析数据表前
[local:/data/run/pg12]:5120pg12@testdb=#explainverbosepg12@testdb-#selecta.id,b.c1,c.c2,d.c3pg12@testdb-#fromt_big_1a,t_big_2b,t_big_3c,t_big_4dpg12@testdb-#wherea.id=b.idandb.c1=c.c1andc.c2=d.c2;QUERYPLAN-----------------------------------------------------------------------------------------------------------------MergeJoin(cost=164722831406.26..1096915306139605248.00rows=73127676034285903872width=238)Output:a.id,b.c1,c.c2,d.c3MergeCond:((b.c1)::text=(c.c1)::text)->Sort(cost=58799667920.13..59102008117.66rows=120936079012width=82)Output:a.id,b.c1SortKey:b.c1->MergeJoin(cost=2124653.55..1816202724.10rows=120936079012width=82)Output:a.id,b.c1MergeCond:(a.id=b.id)->Sort(cost=894232.27..906527.40rows=4918050width=4)Output:a.idSortKey:a.id->SeqScanonpublic.t_big_1a(cost=0.00..213115.50rows=4918050width=4)Output:a.id->Materialize(cost=1230421.27..1255011.52rows=4918050width=82)Output:b.c1,b.id->Sort(cost=1230421.27..1242716.40rows=4918050width=82)Output:b.c1,b.idSortKey:b.id->SeqScanonpublic.t_big_2b(cost=0.00..213115.50rows=4918050width=82)Output:b.c1,b.id->Materialize(cost=105923163486.13..106527843881.19rows=120936079012width=234)Output:c.c2,c.c1,d.c3->Sort(cost=105923163486.13..106225503683.66rows=120936079012width=234)Output:c.c2,c.c1,d.c3SortKey:c.c1->MergeJoin(cost=3066006.55..1817144077.10rows=120936079012width=234)Output:c.c2,c.c1,d.c3MergeCond:((c.c2)::text=(d.c2)::text)->Sort(cost=1533003.27..1545298.40rows=4918050width=156)Output:c.c2,c.c1SortKey:c.c2->SeqScanonpublic.t_big_3c(cost=0.00..213115.50rows=4918050width=156)Output:c.c2,c.c1->Materialize(cost=1533003.27..1557593.52rows=4918050width=156)Output:d.c3,d.c2->Sort(cost=1533003.27..1545298.40rows=4918050width=156)Output:d.c3,d.c2SortKey:d.c2->SeqScanonpublic.t_big_4d(cost=0.00..213115.50rows=4918050width=156)Output:d.c3,d.c2(41rows)
可以看到,未分析前,执行计划使用merge join,计划的cost是一个大数。
执行分析后
[local:/data/run/pg12]:5120pg12@testdb=#explain(analyze,buffers,verbose)selecta.id,b.c1,c.c2,d.c3fromt_big_1a,t_big_2b,t_big_3c,t_big_4dwherea.id=b.idandb.c1=c.c1andc.c2=d.c2;QUERYPLAN-----------------------------------------------------------------------------------------------------------------------------------------------------------------------Gather(cost=896126.19..2564935.91rows=9999844width=97)(actualtime=393803.655..404902.025rows=10000000loops=1)Output:a.id,b.c1,c.c2,d.c3WorkersPlanned:2WorkersLaunched:2Buffers:sharedhit=266read=655676dirtied=486717written=486414,tempread=473954written=486068->ParallelHashJoin(cost=895126.19..1563951.51rows=4166602width=97)(actualtime=393672.896..398825.027rows=3333333loops=3)Output:a.id,b.c1,c.c2,d.c3HashCond:((c.c2)::text=(d.c2)::text)Buffers:sharedhit=266read=655676dirtied=486717written=486414,tempread=473954written=486068Worker0:actualtime=393629.565..399028.498rows=3549817loops=1Buffers:sharedhit=118read=218079dirtied=161599written=161495,tempread=162307written=161880Worker1:actualtime=393585.994..399049.295rows=3609509loops=1Buffers:sharedhit=119read=217313dirtied=161014written=160913,tempread=163324written=160736->ParallelHashJoin(cost=592683.65..1070481.02rows=4166681width=66)(actualtime=328335.871..378143.916rows=3333333loops=3)Output:a.id,b.c1,c.c2HashCond:((b.c1)::text=(c.c1)::text)Buffers:sharedhit=63read=491773dirtied=352782written=352575,tempread=267125written=274312Worker0:actualtime=328475.430..378240.528rows=3325497loops=1Buffers:sharedhit=25read=164024dirtied=117445written=117373,tempread=88941written=91448Worker1:actualtime=328084.038..377943.176rows=3311112loops=1Buffers:sharedhit=29read=163900dirtied=117550written=117481,tempread=88747written=91320->ParallelHashJoin(cost=290238.33..609558.42rows=4166681width=35)(actualtime=158380.042..198763.345rows=3333333loops=3)Output:a.id,b.c1HashCond:(a.id=b.id)Buffers:sharedhit=63read=327838dirtied=218847written=218710,tempread=98317written=100856Worker0:actualtime=158518.764..199077.411rows=3331104loops=1Buffers:sharedhit=25read=109394dirtied=72893written=72845,tempread=32790written=33668Worker1:actualtime=158520.409..198920.394rows=3332824loops=1Buffers:sharedhit=29read=109323dirtied=73002written=72956,tempread=32934written=33560->ParallelSeqScanonpublic.t_big_1a(cost=0.00..205601.81rows=4166681width=4)(actualtime=239.830..75704.152rows=3333333loops=3)Output:a.idBuffers:sharedread=163935dirtied=109449written=109391Worker0:actualtime=239.584..75677.703rows=3327794loops=1Buffers:sharedread=54554dirtied=36489written=36468Worker1:actualtime=240.355..75258.837rows=3347802loops=1Buffers:sharedread=54882dirtied=36486written=36467->ParallelHash(cost=205601.81..205601.81rows=4166681width=35)(actualtime=65812.428..65812.431rows=3333333loops=3)Output:b.c1,b.idBuckets:65536Batches:256MemoryUsage:3328kBBuffers:sharedhit=32read=163903dirtied=109398written=109319,tempwritten=70136Worker0:actualtime=65812.900..65812.904rows=3345876loops=1Buffers:sharedhit=11read=54840dirtied=36404written=36377,tempwritten=23428Worker1:actualtime=65812.873..65812.875rows=3321816loops=1Buffers:sharedhit=15read=54441dirtied=36516written=36489,tempwritten=23320->ParallelSeqScanonpublic.t_big_2b(cost=0.00..205601.81rows=4166681width=35)(actualtime=1.490..47839.237rows=3333333loops=3)Output:b.c1,b.idBuffers:sharedhit=32read=163903dirtied=109398written=109319Worker0:actualtime=1.464..47814.446rows=3345876loops=1Buffers:sharedhit=11read=54840dirtied=36404written=36377Worker1:actualtime=1.470..47104.413rows=3321816loops=1Buffers:sharedhit=15read=54441dirtied=36516written=36489->ParallelHash(cost=205601.81..205601.81rows=4166681width=62)(actualtime=113720.080..113720.080rows=3333333loops=3)Output:c.c2,c.c1Buckets:65536Batches:512MemoryUsage:2432kBBuffers:sharedread=163935dirtied=133935written=133865,tempwritten=103856Worker0:actualtime=113719.124..113719.124rows=3332395loops=1Buffers:sharedread=54630dirtied=44552written=44528,tempwritten=34648Worker1:actualtime=113720.557..113720.558rows=3329197loops=1Buffers:sharedread=54577dirtied=44548written=44525,tempwritten=34576->ParallelSeqScanonpublic.t_big_3c(cost=0.00..205601.81rows=4166681width=62)(actualtime=0.126..80608.068rows=3333333loops=3)Output:c.c2,c.c1Buffers:sharedread=163935dirtied=133935written=133865Worker0:actualtime=0.260..80737.065rows=3332395loops=1Buffers:sharedread=54630dirtied=44552written=44528Worker1:actualtime=0.049..80943.448rows=3329197loops=1Buffers:sharedread=54577dirtied=44548written=44525->ParallelHash(cost=205601.02..205601.02rows=4166602width=62)(actualtime=10279.722..10279.722rows=3333333loops=3)Output:d.c3,d.c2Buckets:65536Batches:512MemoryUsage:2400kBBuffers:sharedhit=32read=163903dirtied=133935written=133839,tempwritten=103004Worker0:actualtime=10222.812..10222.812rows=3297904loops=1Buffers:sharedhit=9read=54055dirtied=44154written=44122,tempwritten=34236Worker1:actualtime=10222.839..10222.839rows=3258559loops=1Buffers:sharedhit=6read=53413dirtied=43464written=43432,tempwritten=33504->ParallelSeqScanonpublic.t_big_4d(cost=0.00..205601.02rows=4166602width=62)(actualtime=0.163..7282.409rows=3333333loops=3)Output:d.c3,d.c2Buffers:sharedhit=32read=163903dirtied=133935written=133839Worker0:actualtime=0.108..7244.071rows=3297904loops=1Buffers:sharedhit=9read=54055dirtied=44154written=44122Worker1:actualtime=0.034..7223.191rows=3258559loops=1Buffers:sharedhit=6read=53413dirtied=43464written=43432PlanningTime:1.134msExecutionTime:405878.841ms(83rows)[local:/data/run/pg12]:5120pg12@testdb=#
可以看到,执行计划中的成本回归一个正常的数值,算法使用Hash Join。由于内存不足,PG把数据拆分为N份,使用临时表来临时缓存Hash Table,使用不同的Batch来执行Join。
到此,关于“分析PostgreSQL中的大表连接”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。