本节介绍了PostgreSQL获取事务快照的主实现逻辑,相应的实现函数是GetTransactionSnapshot。

一、数据结构

全局/静态变量

/* * Currently registered Snapshots. Ordered in a heap by xmin, so that we can * quickly find the one with lowest xmin, to advance our MyPgXact->xmin. * 当前已注册的快照. * 按照xmin堆排序,这样我们可以快速找到xmin最小的一个,从而可以设置MyPgXact->xmin。 */static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg);static pairingheap RegisteredSnapshots = {&xmin_cmp, NULL, NULL};/* first GetTransactionSnapshot call in a transaction? */bool FirstSnapshotSet = false;/* * Remember the serializable transaction snapshot, if any. We cannot trust * FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because * GUC may be reset before us, changing the value of IsolationUsesXactSnapshot. * 如存在则记下serializable事务快照. * 我们不能信任与IsolationUsesXactSnapshot()结合使用的FirstSnapshotSet, * 因为GUC可能会在我们之前重置,改变IsolationUsesXactSnapshot的值。 */static Snapshot FirstXactSnapshot = NULL;/* * CurrentSnapshot points to the only snapshot taken in transaction-snapshot * mode, and to the latest one taken in a read-committed transaction. * SecondarySnapshot is a snapshot that's always up-to-date as of the current * instant, even in transaction-snapshot mode. It should only be used for * special-purpose code (say, RI checking.) CatalogSnapshot points to an * MVCC snapshot intended to be used for catalog scans; we must invalidate it * whenever a system catalog change occurs. * CurrentSnapshot指向在transaction-snapshot模式下获取的唯一快照/在read-committed事务中获取的最新快照。 * SecondarySnapshot是即使在transaction-snapshot模式下,也总是最新的快照。它应该只用于特殊用途码(例如,RI检查)。 * CatalogSnapshot指向打算用于catalog扫描的MVCC快照; * 无论何时发生system catalog更改,我们都必须马上使其失效。 * * These SnapshotData structs are static to simplify memory allocation * (see the hack in GetSnapshotData to avoid repeated malloc/free). * 这些SnapshotData结构体是静态的便于简化内存分配. * (可以回过头来看GetSnapshotData函数如何避免重复的malloc/free) */static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};/* Pointers to valid snapshots *///指向有效的快照static Snapshot CurrentSnapshot = NULL;static Snapshot SecondarySnapshot = NULL;static Snapshot CatalogSnapshot = NULL;static Snapshot HistoricSnapshot = NULL;/* * These are updated by GetSnapshotData. We initialize them this way * for the convenience of TransactionIdIsInProgress: even in bootstrap * mode, we don't want it to say that BootstrapTransactionId is in progress. * 这些变量通过函数GetSnapshotData更新. * 为了便于TransactionIdIsInProgress,以这种方式初始化它们: * 即使在引导模式下,我们也不希望表示BootstrapTransactionId正在进行中。 * * RecentGlobalXmin and RecentGlobalDataXmin are initialized to * InvalidTransactionId, to ensure that no one tries to use a stale * value. Readers should ensure that it has been set to something else * before using it. * RecentGlobalXmin和RecentGlobalDataXmin初始化为InvalidTransactionId, * 以确保没有人尝试使用过时的值。 * 在使用它之前,读取进程应确保它已经被设置为其他值。 */TransactionId TransactionXmin = FirstNormalTransactionId;TransactionId RecentXmin = FirstNormalTransactionId;TransactionId RecentGlobalXmin = InvalidTransactionId;TransactionId RecentGlobalDataXmin = InvalidTransactionId;/* (table, ctid) => (cmin, cmax) mapping during timetravel */static HTAB *tuplecid_data = NULL;

MyPgXact
当前的事务信息.

/* * Flags for PGXACT->vacuumFlags * PGXACT->vacuumFlags标记 * * Note: If you modify these flags, you need to modify PROCARRAY_XXX flags * in src/include/storage/procarray.h. * 注意:如果修改了这些标记,需要更新src/include/storage/procarray.h中的PROCARRAY_XXX标记 * * PROC_RESERVED may later be assigned for use in vacuumFlags, but its value is * used for PROCARRAY_SLOTS_XMIN in procarray.h, so GetOldestXmin won't be able * to match and ignore processes with this flag set. * PROC_RESERVED可能在接下来分配给vacuumFlags使用, * 但是它在procarray.h中用于标识PROCARRAY_SLOTS_XMIN, * 因此GetOldestXmin不能匹配和忽略使用此标记的进程. *///是否auto vacuum worker?#define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? *///正在运行lazy vacuum#define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum *///正在运行analyze#define PROC_IN_ANALYZE 0x04 /* currently running analyze *///只能通过auto vacuum设置#define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only *///在事务外部正在执行逻辑解码#define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical * decoding outside xact *///保留用于procarray#define PROC_RESERVED 0x20 /* reserved for procarray *//* flags reset at EOXact *///在EOXact时用于重置标记的MASK#define PROC_VACUUM_STATE_MASK \ (PROC_IN_VACUUM | PROC_IN_ANALYZE | PROC_VACUUM_FOR_WRAPAROUND)/* * Prior to PostgreSQL 9.2, the fields below were stored as part of the * PGPROC. However, benchmarking revealed that packing these particular * members into a separate array as tightly as possible sped up GetSnapshotData * considerably on systems with many CPU cores, by reducing the number of * cache lines needing to be fetched. Thus, think very carefully before adding * anything else here. */typedef struct PGXACT{ //当前的顶层事务ID(非子事务) //出于优化的目的,只读事务并不会分配事务号(xid = 0) TransactionId xid; /* id of top-level transaction currently being * executed by this proc, if running and XID * is assigned; else InvalidTransactionId */ //在启动事务时,当前正在执行的最小事务号XID,但不包括LAZY VACUUM //vacuum不能清除删除事务号xid >= xmin的元组 TransactionId xmin; /* minimal running XID as it was when we were * starting our xact, excluding LAZY VACUUM: * vacuum must not remove tuples deleted by * xid >= xmin ! */ //vacuum相关的标记 uint8 vacuumFlags; /* vacuum-related flags, see above */ bool overflowed; bool delayChkpt; /* true if this proc delays checkpoint start; * previously called InCommit */ uint8 nxids;} PGXACT;extern PGDLLIMPORT struct PGXACT *MyPgXact;

Snapshot
SnapshotData结构体指针,SnapshotData结构体可表达的信息囊括了所有可能的快照.
有以下几种不同类型的快照:
1.常规的MVCC快照
2.在恢复期间的MVCC快照(处于Hot-Standby模式)
3.在逻辑解码过程中使用的历史MVCC快照
4.作为参数传递给HeapTupleSatisfiesDirty()函数的快照
5.作为参数传递给HeapTupleSatisfiesNonVacuumable()函数的快照
6.用于在没有成员访问情况下SatisfiesAny、Toast和Self的快照

//SnapshotData结构体指针typedef struct SnapshotData *Snapshot;//无效的快照#define InvalidSnapshot ((Snapshot) NULL)/* * We use SnapshotData structures to represent both "regular" (MVCC) * snapshots and "special" snapshots that have non-MVCC semantics. * The specific semantics of a snapshot are encoded by the "satisfies" * function. * 我们使用SnapshotData结构体表示"regular" (MVCC) snapshots和具有非MVCC语义的"special" snapshots。 *///测试函数typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup, Snapshot snapshot, Buffer buffer);//常见的有://HeapTupleSatisfiesMVCC:判断元组对某一快照版本是否有效//HeapTupleSatisfiesUpdate:判断元组是否可更新(同时更新同一个元组)//HeapTupleSatisfiesDirty:判断当前元组是否存在脏数据//HeapTupleSatisfiesSelf:判断tuple对自身信息是否有效//HeapTupleSatisfiesToast:判断是否TOAST表//HeapTupleSatisfiesVacuum:判断元组是否能被VACUUM删除//HeapTupleSatisfiesAny:所有元组都可见//HeapTupleSatisfiesHistoricMVCC:用于CATALOG 表/* * Struct representing all kind of possible snapshots. * 该结构体可表达的信息囊括了所有可能的快照. * * There are several different kinds of snapshots: * * Normal MVCC snapshots * * MVCC snapshots taken during recovery (in Hot-Standby mode) * * Historic MVCC snapshots used during logical decoding * * snapshots passed to HeapTupleSatisfiesDirty() * * snapshots passed to HeapTupleSatisfiesNonVacuumable() * * snapshots used for SatisfiesAny, Toast, Self where no members are * accessed. * 有以下几种不同类型的快照: * * 常规的MVCC快照 * * 在恢复期间的MVCC快照(处于Hot-Standby模式) * * 在逻辑解码过程中使用的历史MVCC快照 * * 作为参数传递给HeapTupleSatisfiesDirty()函数的快照 * * 作为参数传递给HeapTupleSatisfiesNonVacuumable()函数的快照 * * 用于在没有成员访问情况下SatisfiesAny、Toast和Self的快照 * * TODO: It's probably a good idea to split this struct using a NodeTag * similar to how parser and executor nodes are handled, with one type for * each different kind of snapshot to avoid overloading the meaning of * individual fields. * TODO: 使用类似于parser/executor nodes的处理,使用NodeTag来拆分结构体会是一个好的做法, * 使用OO(面向对象继承)的方法. */typedef struct SnapshotData{ //测试tuple是否可见的函数 SnapshotSatisfiesFunc satisfies; /* tuple test function */ /* * The remaining fields are used only for MVCC snapshots, and are normally * just zeroes in special snapshots. (But xmin and xmax are used * specially by HeapTupleSatisfiesDirty, and xmin is used specially by * HeapTupleSatisfiesNonVacuumable.) * 余下的字段仅用于MVCC快照,在特殊快照中通常为0。 * (xmin和xmax可用于HeapTupleSatisfiesDirty,xmin可用于HeapTupleSatisfiesNonVacuumable) * * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see * the effects of all older XIDs except those listed in the snapshot. xmin * is stored as an optimization to avoid needing to search the XID arrays * for most tuples. * XIDs >= xmax的事务,对该快照是不可见的(没有任何影响). * 对该快照可见的是小于xmax,但不在snapshot列表中的XIDs. * 记录xmin是出于优化的目的,避免为大多数tuples搜索XID数组. */ //XID ∈ [2,min)是可见的 TransactionId xmin; /* all XID < xmin are visible to me */ //XID ∈ [xmax,∞)是不可见的 TransactionId xmax; /* all XID >= xmax are invisible to me */ /* * For normal MVCC snapshot this contains the all xact IDs that are in * progress, unless the snapshot was taken during recovery in which case * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e. * it contains *committed* transactions between xmin and xmax. * 对于普通的MVCC快照,xip存储了所有正在进行中的XIDs,除非在恢复期间产生的快照(这时候数组为空) * 对于历史MVCC快照,意义相反,即它包含xmin和xmax之间的*已提交*事务。 * * note: all ids in xip[] satisfy xmin <= xip[i] < xmax * 注意: 所有在xip数组中的XIDs满足xmin <= xip[i] < xmax */ TransactionId *xip; //xip数组中的元素个数 uint32 xcnt; /* # of xact ids in xip[] */ /* * For non-historic MVCC snapshots, this contains subxact IDs that are in * progress (and other transactions that are in progress if taken during * recovery). For historic snapshot it contains *all* xids assigned to the * replayed transaction, including the toplevel xid. * 对于非历史MVCC快照,下面这些域含有活动的subxact IDs. * (以及在恢复过程中状态为进行中的事务). * 对于历史MVCC快照,这些域字段含有*所有*用于回放事务的快照,包括顶层事务XIDs. * * note: all ids in subxip[] are >= xmin, but we don't bother filtering * out any that are >= xmax * 注意:sbuxip数组中的元素均≥ xmin,但我们不需要过滤掉任何>= xmax的项 */ TransactionId *subxip; //subxip数组元素个数 int32 subxcnt; /* # of xact ids in subxip[] */ //是否溢出? bool suboverflowed; /* has the subxip array overflowed? */ //在Recovery期间的快照? bool takenDuringRecovery; /* recovery-shaped snapshot? */ //如为静态快照,则该值为F bool copied; /* false if it's a static snapshot */ //在自身的事务中,CID < curcid是可见的 CommandId curcid; /* in my xact, CID < curcid are visible */ /* * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC * snapshots. * HeapTupleSatisfiesDirty返回的值,在MVCC快照中无用 */ uint32 speculativeToken; /* * Book-keeping information, used by the snapshot manager * 用于快照管理器的Book-keeping信息 */ //在ActiveSnapshot栈中的引用计数 uint32 active_count; /* refcount on ActiveSnapshot stack */ //在RegisteredSnapshots中的引用计数 uint32 regd_count; /* refcount on RegisteredSnapshots */ //RegisteredSnapshots堆中的链接 pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */ //快照"拍摄"时间戳 TimestampTz whenTaken; /* timestamp when snapshot was taken */ //拍照时WAL stream中的位置 XLogRecPtr lsn; /* position in the WAL stream when taken */} SnapshotData;二、源码解读

GetTransactionSnapshot函数在事务处理中为新查询获得相应的快照.

/* * GetTransactionSnapshot * Get the appropriate snapshot for a new query in a transaction. * 在事务处理中为新查询获得相应的快照 * * Note that the return value may point at static storage that will be modified * by future calls and by CommandCounterIncrement(). Callers should call * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be * used very long. * 注意返回值可能会指向将来调用和CommandCounterIncrement()函数修改的静态存储区. * 如需要长时间保持快照,调用者需要调用RegisterSnapshot或者PushActiveSnapshot函数记录快照信息. */SnapshotGetTransactionSnapshot(void){ /* * Return historic snapshot if doing logical decoding. We'll never need a * non-historic transaction snapshot in this (sub-)transaction, so there's * no need to be careful to set one up for later calls to * GetTransactionSnapshot(). * 如执行逻辑解码,则返回历史快照. * 在该事务中,我们不需要非历史快照,因此不需要为后续的GetTransactionSnapshot()调用小心配置 */ if (HistoricSnapshotActive()) { Assert(!FirstSnapshotSet); return HistoricSnapshot; } /* First call in transaction? */ //首次调用? if (!FirstSnapshotSet) { /* * Don't allow catalog snapshot to be older than xact snapshot. Must * do this first to allow the empty-heap Assert to succeed. * 不允许catalog快照比事务快照更旧. * 必须首次执行该函数以确保empty-heap验证是成功的. */ InvalidateCatalogSnapshot(); Assert(pairingheap_is_empty(&RegisteredSnapshots)); Assert(FirstXactSnapshot == NULL); if (IsInParallelMode()) elog(ERROR, "cannot take query snapshot during a parallel operation"); /* * In transaction-snapshot mode, the first snapshot must live until * end of xact regardless of what the caller does with it, so we must * make a copy of it rather than returning CurrentSnapshotData * directly. Furthermore, if we're running in serializable mode, * predicate.c needs to wrap the snapshot fetch in its own processing. * 在transaction-snapshot模式下,无论调用者对它做什么,第一个快照必须一直存在到xact事务结束, * 因此我们必须复制它,而不是直接返回CurrentSnapshotData。 */ if (IsolationUsesXactSnapshot()) { //transaction-snapshot模式 /* First, create the snapshot in CurrentSnapshotData */ //首先,在CurrentSnapshotData中创建快照 if (IsolationIsSerializable()) //隔离级别 = Serializable CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData); else //其他隔离级别 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData); /* Make a saved copy */ //拷贝快照 CurrentSnapshot = CopySnapshot(CurrentSnapshot); FirstXactSnapshot = CurrentSnapshot; /* Mark it as "registered" in FirstXactSnapshot */ //在FirstXactSnapshot中标记该快照已注册 FirstXactSnapshot->regd_count++; pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node); } else //非transaction-snapshot模式,直接获取 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData); //设置标记 FirstSnapshotSet = true; return CurrentSnapshot; } //transaction-snapshot模式 if (IsolationUsesXactSnapshot()) return CurrentSnapshot; /* Don't allow catalog snapshot to be older than xact snapshot. */ //不允许catalog快照比事务快照旧 InvalidateCatalogSnapshot(); //获取快照 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData); //返回 return CurrentSnapshot;}三、跟踪分析

执行简单查询,可触发获取快照逻辑.

16:35:08 (xdb@[local]:5432)testdb=# begin;BEGIN16:35:13 (xdb@[local]:5432)testdb=#* select 1;

启动gdb,设置断点

(gdb) b GetTransactionSnapshotBreakpoint 1 at 0xa9492e: file snapmgr.c, line 312.(gdb) cContinuing.Breakpoint 1, GetTransactionSnapshot () at snapmgr.c:312312 if (HistoricSnapshotActive())(gdb)

如执行逻辑解码,则返回历史快照(本例不是).

(gdb) n319 if (!FirstSnapshotSet)(gdb)

首次调用?是,进入相应的逻辑

319 if (!FirstSnapshotSet)(gdb) n325 InvalidateCatalogSnapshot();(gdb) 327 Assert(pairingheap_is_empty(&RegisteredSnapshots));(gdb) 328 Assert(FirstXactSnapshot == NULL);(gdb) n330 if (IsInParallelMode())(gdb)

非transaction-snapshot模式,直接调用GetSnapshotData获取

(gdb) 341 if (IsolationUsesXactSnapshot())(gdb) 356 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);(gdb) p CurrentSnapshotData$1 = {satisfies = 0xa9310d <HeapTupleSatisfiesMVCC>, xmin = 2342, xmax = 2350, xip = 0x14bee40, xcnt = 2, subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}(gdb)

函数执行成功,查看CurrentSnapshot
注:2342事务所在的进程已被kill

(gdb) n358(gdb) p CurrentSnapshot$2 = (Snapshot) 0xf9be60 <CurrentSnapshotData>(gdb) p *CurrentSnapshot$3 = {satisfies = 0xa9310d <HeapTupleSatisfiesMVCC>, xmin = 2350, xmax = 2350, xip = 0x14bee40, xcnt = 0, subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}(gdb)

执行成功

(gdb) n359 return CurrentSnapshot;(gdb) 371 }(gdb) exec_simple_query (query_string=0x149aec8 "select 1;") at postgres.c:10591059 snapshot_set = true;(gdb)

查看全局变量MyPgXact

(gdb) p MyPgXact$7 = (struct PGXACT *) 0x7f47103c01f4(gdb) p *MyPgXact$8 = {xid = 0, xmin = 2350, vacuumFlags = 0 '\000', overflowed = false, delayChkpt = false, nxids = 0 '\000'}(gdb)

注意:
1.xid = 0,表示未分配事务号.出于优化的理由,PG在修改数据时才会分配事务号.
2.txid_current()函数会分配事务号;txid_current_if_assigned()函数不会.

DONE!

遗留问题:
1.CurrentSnapshotData全局变量中的信息何时初始化/更改?
2.GetSnapshotData函数的实现(下节介绍).四、参考资料

PG Source Code