怎么理解PostgreSQL全表扫描问题

2025-03-13 技术教程

这篇文章主要讲解了“怎么理解PostgreSQL全表扫描问题”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“怎么理解PostgreSQL全表扫描问题”吧！

本节内容来源于PGer的一个问题:
Q:
由于多版本的存在，那么全表扫描是不是需要更长的时间了呢？

A:
关于全表扫描,不妨考虑2种极端的情况:
1.insert数据(事务已提交,下同),没有执行update/delete,没有dead tuple,全表扫描效率没有影响;
2.insert数据,执行了大量的update/delete,同时禁用了autovacuum也没有手工执行vacuum,那么存在大量的dead tuple,性能上一是需要更多的IO操作,二是需要执行额外的CPU判断(对于所有的tuple都要执行可见性判断).
其判断逻辑如下:

((Xmin==my-transaction&&　　　　　　　insertedbythecurrenttransaction　Cmin<my-command&&　　　　　　　　　　beforethiscommand,and　(Xmaxisnull||　　　　　　　　　　　　therowhasnotbeendeleted,or　　(Xmax==my-transaction&&　　　　　　itwasdeletedbythecurrenttransaction　　　Cmax>=my-command)))　　　　　　　butnotbeforethiscommand,||　　　　　　　　　　　　　　　　　　　　　or　(Xminiscommitted&&　　　　　　　　　　therowwasinsertedbyacommittedtransaction,and　　(Xmaxisnull||　　　　　　　　　　　　therowhasnotbeendeleted,or　　　(Xmax==my-transaction&&　　　　　therowisbeingdeletedbythistransaction　　　　Cmax>=my-command)||　　　　　　butit’snotdeleted"yet",or　　　　(Xmax!=my-transaction&&　　　　therowwasdeletedbyanothertransaction　　　　　Xmaxisnotcommitted))))　　　　thathasnotbeencommitted

简单做个实验,创建一张表t_fts,
1.插入数据,大小为s1,执行全表扫描,时间为m秒;
2.update所有行,大小为s2,执行全表扫描,时间为n秒.
理论上来说,n应为m的s2/s1倍左右(相对于IO时间,如果tuple数不多,CPU时间可以忽略不计).
创建数据表,插入数据:

testdb=#droptableifexistst_fts;DROPTABLEtestdb=#createtablet_fts(idint,c1varchar(200),c2varchar(200));CREATETABLEtestdb=#testdb=#insertintot_ftsselectx,lpad('c1'||x,200,'x'),lpad('c1'||x,200,'x')fromgenerate_series(1,2000000)asx;INSERT02000000testdb=#selectpg_size_pretty(pg_table_size('t_fts'));pg_size_pretty----------------868MB(1row)

禁用autovacuum,执行查询:

testdb=#altersystemsetautovacuum=off;ALTERSYSTEMtestdb=#showautovacuum;autovacuum------------off(1row)testdb=#explainanalyzeverboseselect*fromt_fts;QUERYPLAN----------------------------------------------------------------------------------------------------------------------------SeqScanonpublic.t_fts(cost=0.00..131112.16rows=2000016width=412)(actualtime=0.048..1086.289rows=2000000loops=1)Output:id,c1,c2PlanningTime:30.762msExecutionTime:1181.360ms(4rows)

执行update:

testdb=#updatet_ftssetc1=lpad('c1'||(id+1),200,id+1||''),c2=lpad('c1'||(id+1),200,id+1||'');UPDATE2000000testdb=#selectpg_size_pretty(pg_table_size('t_fts'));pg_size_pretty----------------1737MB(1row)testdb=#explainanalyzeverboseselect*fromt_fts;QUERYPLAN-------------------------------------------------------------------------------------------------------------------------------SeqScanonpublic.t_fts(cost=0.00..262223.14rows=4000014width=412)(actualtime=3168.414..6117.780rows=2000000loops=1)Output:id,c1,c2PlanningTime:5.493msExecutionTime:6205.705ms(4rows)testdb=#explainanalyzeverboseselect*fromt_fts;QUERYPLAN------------------------------------------------------------------------------------------------------------------------------SeqScanonpublic.t_fts(cost=0.00..262223.14rows=4000014width=412)(actualtime=776.660..2311.270rows=2000000loops=1)Output:id,c1,c2PlanningTime:0.426msExecutionTime:2391.895ms(4rows)testdb=#explainanalyzeverboseselect*fromt_fts;QUERYPLAN------------------------------------------------------------------------------------------------------------------------------SeqScanonpublic.t_fts(cost=0.00..262223.14rows=4000014width=412)(actualtime=728.758..2293.157rows=2000000loops=1)Output:id,c1,c2PlanningTime:0.481msExecutionTime:2373.241ms(4rows)

感谢各位的阅读，以上就是“怎么理解PostgreSQL全表扫描问题”的内容了，经过本文的学习后，相信大家对怎么理解PostgreSQL全表扫描问题这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是亿速云，小编将为大家推送更多相关知识点的文章，欢迎关注！