PostgreSQL搜索插件有什么优点
本篇内容主要讲解“PostgreSQL搜索插件有什么优点”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“PostgreSQL搜索插件有什么优点”吧!
gitclonehttps://github.com/postgrespro/rumcdrum./var/lib/pgsql/.bash_profileUSE_PGXS=1makeUSE_PGXS=1makeinstallcreateextensionrum;
1、生成随机浮点数组的UDF接口
createorreplacefunctiongen_rand_float4(int,int)returnsfloat4[]as$$selectarray(select(random()*$1)::float4fromgenerate_series(1,$2));$$languagesqlstrict;
2、建表,索引
createunloggedtablet_rum(idintprimarykey,arrfloat4[]);createindexidx_t_rum_1ont_rumusingrum(arr);
4、写入随机浮点数数组
vitest.sql\setidrandom(1,2000000000)insertintot_rumvalues(:id,gen_rand_float4(10,16))onconflict(id)donothing;
pgbench-Mprepared-n-r-P1-f./test.sql-c64-j64-t10000000
postgres=#select*fromt_rumlimit2;id|arr-----------+-----------------------------------------------------------------------------------------------------------------------------------------182025544|{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}51515704|{0.123099,9.26626,0.00549683,9.01483,0.911669,3.44338,4.55135,4.65002,0.820029,9.66546,1.93433,3.00254,1.28121,8.99883,1.85269,6.39579}(2rows)postgres=#selectcount(*)fromt_rum;count---------3244994(1row)
5、使用rum提供的数组相似搜索(元素重叠率计算)
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_rumorderbyarr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'limit1;QUERYPLAN------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=22435.67..22435.68rows=1width=97)(actualtime=12527.447..12527.450rows=1loops=1)Output:id,arr,((arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[]))Buffers:sharedhit=50450->Sort(cost=22435.67..29469.15rows=3244994width=97)(actualtime=12527.445..12527.446rows=1loops=1)Output:id,arr,((arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[]))SortKey:((t_rum.arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[]))SortMethod:top-NheapsortMemory:25kBBuffers:sharedhit=50450->SeqScanonpublic.t_rum(cost=0.00..8368.72rows=3244994width=97)(actualtime=0.054..11788.483rows=3244994loops=1)Output:id,arr,(arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[])Buffers:sharedhit=50447PlanningTime:0.115msExecutionTime:12527.498ms(13rows)
你会发现,走了索引,但是并不快。扫描了大量(50447)的索引PAGE。
原因是我们没有管它的阈值,导致扫描了大量的index BLOCK。默认的阈值为0.5,太低了。
postgres=#showrum.array_similarity_thresholdpostgres-#;rum.array_similarity_threshold--------------------------------0.5(1row)
调成0.9,只输出90%以上相似(重叠度)的数组。性能瞬间暴增,扫描的数据块也变少了。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_rumwherearr%'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'orderbyarr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'limit1;QUERYPLAN------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=1.54..1.56rows=1width=97)(actualtime=0.664..0.664rows=0loops=1)Output:id,arr,((arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[]))Buffers:sharedhit=128read=40->IndexScanusingidx_t_rum_1onpublic.t_rum(cost=1.54..87.65rows=3245width=97)(actualtime=0.662..0.662rows=0loops=1)Output:id,arr,(arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[])IndexCond:(t_rum.arr%'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[])OrderBy:(t_rum.arr<=>'{5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444}'::real[])Buffers:sharedhit=128read=40PlanningTime:0.184msExecutionTime:0.691ms(10rows)元素重叠度相似搜索优化
1、调整阈值,阶梯化解题。
《PostgreSQL 相似搜索设计与性能 - 地址、QA、POI等文本 毫秒级相似搜索实践》
实际上图像特征值近似搜索,也有优化的空间,接下来进入正题。
部署imgsmlr (on PG 11)1、假设yum安装的PG 11
2、克隆源码
yuminstall-ygitgitclonehttps://github.com/postgrespro/imgsmlrcdimgsmlr
3、修改头文件
viimgsmlr.h//追加#ifndefFALSE#defineFALSE(0)#endif#ifndefTRUE#defineTRUE(!FALSE)#endif
4、安装依赖的图像转换包
yuminstall-ygd-devel
5、编译安装IMGSMLR插件
./var/lib/pgsql/.bash_profileUSE_PGXS=1makeUSE_PGXS=1makeinstall单节点 单表图像搜索 (4亿图像)
1、创建生成随机图像特征值signature的UDF。
createorreplacefunctiongen_rand_img_sig(int)returnssignatureas$$select('('||rtrim(ltrim(array(select(random()*$1)::float4fromgenerate_series(1,16))::text,'{'),'}')||')')::signature;$$languagesqlstrict;
postgres=#select*fromgen_rand_img_sig(10);gen_rand_img_sig------------------------------------------------------------------------------------------------------------------------------------------------------------------(6.744310,5.105020,0.087113,3.808010,8.129480,2.834540,2.495250,0.940481,0.033208,6.583490,2.840330,1.422440,6.683830,0.080847,8.327730,2.471430)(1row)postgres=#select*fromgen_rand_img_sig(10);gen_rand_img_sig------------------------------------------------------------------------------------------------------------------------------------------------------------------(3.013650,6.170690,0.601905,2.692030,1.268540,7.803740,9.757770,5.537750,0.391753,4.440790,1.201580,5.501380,6.166980,0.240686,9.768680,2.911290)(1row)
2、建表,建图像特征值索引
createtablet_img_sig(idintprimarykey,sigsignature);createindexidx_t_img_sig_1ont_img_sigusinggist(sig);
3、写入约4亿随机图像特征值
vitestsig.sql\setidrandom(1,2000000000)insertintot_img_sigvalues(:id,gen_rand_img_sig(10))onconflict(id)donothing;
pgbench-Mprepared-n-r-P1-f./testsig.sql-c32-j32-t20000000
postgres=#select*fromt_imglimit10;id|sig-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------47902935|(5.861920,1.062770,8.318020,2.205840,0.202951,6.956610,1.413190,2.898480,8.961630,6.377800,1.110450,6.684520,2.286290,7.850760,1.832650,0.074348)174656795|(2.165030,0.183753,9.913950,9.208260,5.165660,6.603510,2.008380,8.117910,2.358590,5.466330,9.139280,8.893700,4.664190,9.361670,9.016990,2.271000)96186891|(9.605980,4.395920,4.336720,3.174360,8.706960,0.155107,9.408940,4.531100,2.783530,5.681780,9.792380,6.428320,2.983760,9.733290,7.635160,7.035780)55061667|(7.567960,5.874530,5.222040,5.638520,3.488960,8.770750,7.054610,7.239630,9.202280,9.465020,4.079080,5.729770,0.475227,8.434800,6.873730,5.140080)64659434|(4.860650,3.984440,3.009900,5.116680,6.489150,4.224800,0.609752,8.731120,6.577390,8.542540,9.096120,8.976700,8.936000,2.836270,7.186250,6.264300)87143098|(4.801570,7.870150,0.939599,3.666670,1.102340,5.819580,6.511330,6.430760,0.584531,3.024190,6.255460,8.823820,5.076960,0.181344,8.137380,1.230360)109245945|(7.541850,7.201460,6.858400,2.605210,1.283090,7.525200,4.213240,8.413760,9.707390,1.916970,1.719320,1.255280,9.006780,4.851420,2.168250,5.997360)4979218|(8.463000,4.051410,9.057320,1.367980,3.344340,7.032640,8.583770,1.873090,5.524810,0.187254,5.783270,6.141040,2.479410,6.406450,9.371700,0.050690)72846137|(7.018560,4.039150,9.114800,2.911170,5.531180,8.557330,6.739050,0.103649,3.691390,7.584640,8.184180,0.599390,9.037130,4.090610,4.369770,6.480000)36813995|(4.643480,8.704640,1.073880,2.665530,3.298300,9.244280,5.768050,0.887555,5.990350,2.991390,6.186550,6.464940,6.187140,0.150242,2.123070,2.932270)(10rows)Time:58.101ms
写入约4.39亿图像特征值。
postgres=#selectcount(*)fromt_img_sig;count-----------438924137(1row)
4、输入一个图像特征值,搜索与之最相似的图像。
explain(analyze,verbose,timing,costs,buffers)select*fromt_img_sigorderbysig<->'(5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444)'limit1;
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_img_sigwheresignature_distance(sig,'(5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444)')>0.9orderbysig<->'(5.07998,6.80827,5.42024,2.53619,4.10843,0.532198,4.33886,9.60262,6.68369,8.01305,9.60298,8.087,1.25819,6.54424,6.04902,5.3444)'limit1;QUERYPLAN----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=0.48..0.51rows=1width=72)(actualtime=4094.810..4094.812rows=1loops=1)Output:id,sig,((sig<->'(5.079980,6.808270,5.420240,2.536190,4.108430,0.532198,4.338860,9.602620,6.683690,8.013050,9.602980,8.087000,1.258190,6.544240,6.049020,5.344400)'::signature))Buffers:sharedhit=205999->IndexScanusingidx_t_img_sig_1onpublic.t_img_sig(cost=0.48..5361351.06rows=146395778width=72)(actualtime=4094.808..4094.808rows=1loops=1)Output:id,sig,(sig<->'(5.079980,6.808270,5.420240,2.536190,4.108430,0.532198,4.338860,9.602620,6.683690,8.013050,9.602980,8.087000,1.258190,6.544240,6.049020,5.344400)'::signature)OrderBy:(t_img_sig.sig<->'(5.079980,6.808270,5.420240,2.536190,4.108430,0.532198,4.338860,9.602620,6.683690,8.013050,9.602980,8.087000,1.258190,6.544240,6.049020,5.344400)'::signature)Filter:(signature_distance(t_img_sig.sig,'(5.079980,6.808270,5.420240,2.536190,4.108430,0.532198,4.338860,9.602620,6.683690,8.013050,9.602980,8.087000,1.258190,6.544240,6.049020,5.344400)'::signature)>'0.9'::doubleprecision)Buffers:sharedhit=205999PlanningTime:0.073msExecutionTime:4194.485ms(10rows)性能与瓶颈
性能:4.39亿图像特征值,以图搜图约4.2秒。
瓶颈:
1、扫描了大量的索引页(205999)。
优化思路1、压缩精度,比如使用3位小数。据用户说有10倍性能提升。
精度优化如下,使用新的生成图像特征值的函数,使用3位小数。
createorreplacefunctiongen_rand_img_sig3(int)returnssignatureas$$select('('||rtrim(ltrim(array(selecttrunc((random()*$1)::numeric,3)fromgenerate_series(1,16))::text,'{'),'}')||')')::signature;$$languagesqlstrict;
例子如下
postgres=#selectgen_rand_img_sig3(10);gen_rand_img_sig3------------------------------------------------------------------------------------------------------------------------------------------------------------------(2.984000,3.323000,4.083000,6.292000,5.008000,9.029000,6.208000,1.141000,1.796000,9.257000,1.397000,1.235000,7.157000,3.745000,0.112000,7.723000)(1row)
2、使用分区表+dblink异步接口并行调用。(内核层面直接支持imgsmlr gist index scan并行更好)
下一篇介绍
3、使用citus sharding。多机,提高整体计算能力。 (因为扫描大量索引页,即使CPU没有瓶颈,将来内存带宽也会成为瓶颈。多机可以解决这个问题。)
下一篇介绍
4、内核层面,支持维度更低的signature,现在是16片,比如支持降低到4片,性能也可以提升。
精度现象1、当有记录可以完全匹配时,扫描少量INDEX PAGE。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_img_sigorderbysig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179000)'::signaturelimit1;QUERYPLAN---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=0.48..0.49rows=1width=72)(actualtime=1.596..1.598rows=1loops=1)Output:id,sig,((sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179000)'::signature))Buffers:sharedhit=125->IndexScanusingt_img_sig1_sig_idxonpublic.t_img_sig(cost=0.48..7318159.22rows=785457848width=72)(actualtime=1.594..1.595rows=1loops=1)Output:id,sig,(sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179000)'::signature)OrderBy:(t_img_sig.sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179000)'::signature)Buffers:sharedhit=125PlanningTime:0.072msExecutionTime:1.621ms(9rows)
2、当修改少量内容,少量值完全匹配,其他值不完全匹配时,扫描的INDEX PAGE增加。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_img_sigorderbysig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179001)'::signaturelimit1;QUERYPLAN---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=0.48..0.49rows=1width=72)(actualtime=7.051..7.052rows=1loops=1)Output:id,sig,((sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179001)'::signature))Buffers:sharedhit=454->IndexScanusingt_img_sig1_sig_idxonpublic.t_img_sig(cost=0.48..7324626.56rows=786152016width=72)(actualtime=7.049..7.049rows=1loops=1)Output:id,sig,(sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179001)'::signature)OrderBy:(t_img_sig.sig<->'(3.727000,2.594000,0.185000,3.996000,6.450000,7.126000,5.499000,1.540000,8.239000,6.262000,2.053000,2.566000,4.522000,6.929000,1.582000,2.179001)'::signature)Buffers:sharedhit=454PlanningTime:0.074msExecutionTime:7.076ms(9rows)
3、当大量修改值,不能完全匹配时,需要扫描大量INDEX PAGE。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromt_img_sigorderbysig<->'(7.727000,3.594000,1.185000,4.996000,6.950000,7.129000,5.429000,1.520000,8.219000,6.222000,2.013000,2.536000,4.532000,6.939000,1.538000,2.178000)'::signaturelimit1;QUERYPLAN---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit(cost=0.47..0.48rows=1width=72)(actualtime=2528.890..2528.891rows=1loops=1)Output:id,sig,((sig<->'(7.727000,3.594000,1.185000,4.996000,6.950000,7.129000,5.429000,1.520000,8.219000,6.222000,2.013000,2.536000,4.532000,6.939000,1.538000,2.178000)'::signature))Buffers:sharedhit=121510->IndexScanusingt_img_sig1_sig_idxonpublic.t_img_sig(cost=0.47..1361409.21rows=146121007width=72)(actualtime=2528.887..2528.888rows=1loops=1)Output:id,sig,(sig<->'(7.727000,3.594000,1.185000,4.996000,6.950000,7.129000,5.429000,1.520000,8.219000,6.222000,2.013000,2.536000,4.532000,6.939000,1.538000,2.178000)'::signature)OrderBy:(t_img_sig.sig<->'(7.727000,3.594000,1.185000,4.996000,6.950000,7.129000,5.429000,1.520000,8.219000,6.222000,2.013000,2.536000,4.532000,6.939000,1.538000,2.178000)'::signature)Buffers:sharedhit=121510PlanningTime:0.092msExecutionTime:2582.558ms(9rows)
到此,相信大家对“PostgreSQL搜索插件有什么优点”有了更深的了解,不妨来实际操作一番吧!这里是亿速云网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。