PostgreSQL 源码解读(135)- MVCC#19(vacuum过程-heap_execute_freeze_tuple函数)
本节简单介绍了PostgreSQL手工执行vacuum的处理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap->heap_execute_freeze_tuple函数的实现逻辑,该函数执行实际的元组冻结操作(先前已完成准备工作)。
一、数据结构宏定义
Vacuum和Analyze命令选项
/* ---------------------- * Vacuum and Analyze Statements * Vacuum和Analyze命令选项 * * Even though these are nominally two statements, it's convenient to use * just one node type for both. Note that at least one of VACOPT_VACUUM * and VACOPT_ANALYZE must be set in options. * 虽然在这里有两种不同的语句,但只需要使用统一的Node类型即可. * 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在选项中设置. * ---------------------- */typedef enum VacuumOption{ VACOPT_VACUUM = 1 << 0, /* do VACUUM */ VACOPT_ANALYZE = 1 << 1, /* do ANALYZE */ VACOPT_VERBOSE = 1 << 2, /* print progress info */ VACOPT_FREEZE = 1 << 3, /* FREEZE option */ VACOPT_FULL = 1 << 4, /* FULL (non-concurrent) vacuum */ VACOPT_SKIP_LOCKED = 1 << 5, /* skip if cannot get lock */ VACOPT_SKIPTOAST = 1 << 6, /* don't process the TOAST table, if any */ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */} VacuumOption;
HeapTupleHeaderData
堆元组头部.为了避免浪费空间,字段通过这么一种方式进行布局避免不必要的对齐填充.
/* * Heap tuple header. To avoid wasting space, the fields should be * laid out in such a way as to avoid structure padding. * 堆元组头部.为了避免浪费空间,字段通过这么一种方式进行布局避免结构体不必要的填充. * * Datums of composite types (row types) share the same general structure * as on-disk tuples, so that the same routines can be used to build and * examine them. However the requirements are slightly different: a Datum * does not need any transaction visibility information, and it does need * a length word and some embedded type information. We can achieve this * by overlaying the xmin/cmin/xmax/cmax/xvac fields of a heap tuple * with the fields needed in the Datum case. Typically, all tuples built * in-memory will be initialized with the Datum fields; but when a tuple is * about to be inserted in a table, the transaction fields will be filled, * overwriting the datum fields. * 组合类型(行类型)的Datums与磁盘上的元组共享相同的常规结构体, * 因此可以使用相同的处理过程来构造和检查这些信息. * 但是,需求可能很不一样:Datum不需要任何事物可见性相关的信息,但确实需要长度字和一些嵌入的类型信息. * 在Datum这种情况下,我们可以通过使用堆元组中的xmin/cmin/xmax/cmax/xvac字段叠加来获取这些信息. * 典型的,在内存中构造的所有元组会通过Datum字段初始化,但在元组将要插入到表时,事务字段会被填充,覆写Datum字段. * * The overall structure of a heap tuple looks like: * fixed fields (HeapTupleHeaderData struct) * nulls bitmap (if HEAP_HASNULL is set in t_infomask) * alignment padding (as needed to make user data MAXALIGN'd) * object ID (if HEAP_HASOID_OLD is set in t_infomask, not created * anymore) * user data fields * 堆元组的整体结构看起来是这样的: * 固定字段(HeapTupleHeaderData结构体) * nulls位图(如在t_infomask中设置了HEAP_HASNULL标记位) * 对齐填充(如MAXALIGN) * 对象ID(如t_infomask设置了HEAP_HASOID_OLD标记位,则没有创建) * 用户数据字段 * * We store five "virtual" fields Xmin, Cmin, Xmax, Cmax, and Xvac in three * physical fields. Xmin and Xmax are always really stored, but Cmin, Cmax * and Xvac share a field. This works because we know that Cmin and Cmax * are only interesting for the lifetime of the inserting and deleting * transaction respectively. If a tuple is inserted and deleted in the same * transaction, we store a "combo" command id that can be mapped to the real * cmin and cmax, but only by use of local state within the originating * backend. See combocid.c for more details. Meanwhile, Xvac is only set by * old-style VACUUM FULL, which does not have any command sub-structure and so * does not need either Cmin or Cmax. (This requires that old-style VACUUM * FULL never try to move a tuple whose Cmin or Cmax is still interesting, * ie, an insert-in-progress or delete-in-progress tuple.) * 在三个物理字段中存储了5个"虚拟"字段,分别是Xmin, Cmin, Xmax, Cmax, and Xvac. * Xmin和Xmax通常是实际存储的,但Cmin,Cmax和Xvac共享一个字段. * 这样之所以可行是因为我们知道Cmin和Cmax只在相应的插入和删除事务生命周期时才会有用. * 如果元组在同一个事务中插入和删除,则存储一个"combo"命令ID,该ID可以映射到实际的cmin和cmax, * 但只有在原始后台进程中使用本地状态时才使用. * 同时,Xvac在老版本的VACUUM FULL时才会设置,该命令不存在命令子结构因此不需要Cmin和Cmax. * (这需要老版本的VACUUM FULL永远不要尝试移动Cmin和Cmax仍有用的元组,比如在插入或删除元组期间). * * A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid * is initialized with its own TID (location). If the tuple is ever updated, * its t_ctid is changed to point to the replacement version of the tuple. Or * if the tuple is moved from one partition to another, due to an update of * the partition key, t_ctid is set to a special value to indicate that * (see ItemPointerSetMovedPartitions). Thus, a tuple is the latest version * of its row iff XMAX is invalid or * t_ctid points to itself (in which case, if XMAX is valid, the tuple is * either locked or deleted). One can follow the chain of t_ctid links * to find the newest version of the row, unless it was moved to a different * partition. Beware however that VACUUM might * erase the pointed-to (newer) tuple before erasing the pointing (older) * tuple. Hence, when following a t_ctid link, it is necessary to check * to see if the referenced slot is empty or contains an unrelated tuple. * Check that the referenced tuple has XMIN equal to the referencing tuple's * XMAX to verify that it is actually the descendant version and not an * unrelated tuple stored into a slot recently freed by VACUUM. If either * check fails, one may assume that there is no live descendant version. * 关于c_ctid要说的:不管什么时候元组存储到磁盘上,元组的t_ctid使用自己的TID(位置)进行初始化. * 如果元组曾经修改过,那么t_ctid修改为指向元组的新版本上. * 或者,如果元组从一个分区移动到另外一个分区,由于分区键的修改, * t_ctid会设置为一个特别的值用以表示这种情况(详细查看ItemPointerSetMovedPartitions). * 因此,在XMAX是无需或者t_ctid指向自己的时候,元组是最后的版本 * (在这种情况下,如果XMAX是有效的,元组要么被锁定要么已被删除) * * t_ctid is sometimes used to store a speculative insertion token, instead * of a real TID. A speculative token is set on a tuple that's being * inserted, until the inserter is sure that it wants to go ahead with the * insertion. Hence a token should only be seen on a tuple with an XMAX * that's still in-progress, or invalid/aborted. The token is replaced with * the tuple's real TID when the insertion is confirmed. One should never * see a speculative insertion token while following a chain of t_ctid links, * because they are not used on updates, only insertions. * t_ctid有时候用于存储 speculative insertion token而不是一个实际的TID. * 在正在插入的元组上设置speculative token,直至插入程序确定继续插入. * 因此token在XMAX事务正在处理或者无效/回滚时可以查看. * token在插入确认后被替换成实际的TID. * 在跟踪t_ctid链接链时,不应该看到speculative insertion token, * 因为它们不用于更新,只用于插入。 * * Following the fixed header fields, the nulls bitmap is stored (beginning * at t_bits). The bitmap is *not* stored if t_infomask shows that there * are no nulls in the tuple. If an OID field is present (as indicated by * t_infomask), then it is stored just before the user data, which begins at * the offset shown by t_hoff. Note that t_hoff must be a multiple of * MAXALIGN. * 在固定的头部字段后是nulls位图(以t_bits开始). * 如t_infomask标记提示没有空值,则不存才nulls位图. * 如果OID字段是现成的(通过t_infomask指示),那么在用户数据前存储,用户数据从t_hoff所示的偏移量开始。 * 注意t_hoff必须是MAXALIGN的倍数. */typedef struct HeapTupleFields{ TransactionId t_xmin; /* 插入事务ID;inserting xact ID */ TransactionId t_xmax; /* 删除或锁定事务ID;deleting or locking xact ID */ union { CommandId t_cid; /* 插入或删除命令ID或者combo命令;inserting or deleting command ID, or both */ TransactionId t_xvac; /* old-style VACUUM FULL xact ID */ } t_field3;//联合体} HeapTupleFields;//头部字段typedef struct DatumTupleFields{ int32 datum_len_; /* 可变长头部(不能够直接接触);varlena header (do not touch directly!) */ int32 datum_typmod; /* -1或者是记录类型标识符;-1, or identifier of a record type */ Oid datum_typeid; /* 组合类型OID或者RECORDOID;composite type OID, or RECORDOID */ /* * datum_typeid cannot be a domain over composite, only plain composite, * even if the datum is meant as a value of a domain-over-composite type. * This is in line with the general principle that CoerceToDomain does not * change the physical representation of the base type value. * 即使datum是domain-over-composite类型,datum_typeid也不能是域组合只能是平面组合. * 这与一般原则相一致,即CoerceToDomain不改变基类型值的物理表示形式。 * * Note: field ordering is chosen with thought that Oid might someday * widen to 64 bits. * 注意:字段排序的选择考虑到Oid可能有一天会扩展到64位。 */} DatumTupleFields;struct HeapTupleHeaderData{ union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple (or a * speculative insertion token) */ /* Fields below here must match MinimalTupleData! */#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2 uint16 t_infomask2; /* number of attributes + various flags */#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3 uint16 t_infomask; /* various flag bits, see below */#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4 uint8 t_hoff; /* sizeof header incl. bitmap, padding */ /* ^ - 23 bytes - ^ */#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5 bits8 t_bits[FLEXIBLE_ARRAY_MEMBER]; /* bitmap of NULLs */ /* MORE DATA FOLLOWS AT END OF STRUCT */};typedef HeapTupleHeaderData* HeapTupleHeader;/*结构体展开,详见下表:Field Type Length Offset Descriptiont_xmin TransactionId 4 bytes 0 insert XID stampt_xmax TransactionId 4 bytes 4 delete XID stampt_cid CommandId 4 bytes 8 insert and/or delete CID stamp (overlays with t_xvac)t_xvac TransactionId 4 bytes 8 XID for VACUUM operation moving a row versiont_ctid ItemPointerData 6 bytes 12 current TID of this or newer row versiont_infomask2 uint16 2 bytes 18 number of attributes, plus various flag bitst_infomask uint16 2 bytes 20 various flag bitst_hoff uint8 1 byte 22 offset to user data//注意:t_cid和t_xvac为联合体,共用存储空间*///t_infomask=\x0802,十进制值为2050,二进制值为100000000010//t_infomask说明 1 #define HEAP_HASNULL 0x0001 /* has null attribute(s) */ 10 #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */ 100 #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */ 1000 #define HEAP_HASOID 0x0008 /* has an object-id field */ 10000 #define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */ 100000 #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */ 1000000 #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */ 10000000 #define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */ /* xmax is a shared locker */ #define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK) #define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \ HEAP_XMAX_KEYSHR_LOCK) 100000000 #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */ 1000000000 #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */ #define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID) 10000000000 #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */ 100000000000 #define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */ 1000000000000 #define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */ 10000000000000 #define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */ 100000000000000 #define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0 * VACUUM FULL; kept for binary * upgrade support */1000000000000000 #define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0 * VACUUM FULL; kept for binary * upgrade support */ #define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)1111111111110000 #define HEAP_XACT_MASK 0xFFF0 /* visibility-related bits *///\x0802,二进制100000000010表示第2位和第12位为1,//意味着存在可变长属性(HEAP_HASVARWIDTH),XMAX无效(HEAP_XMAX_INVALID)/* * information stored in t_infomask2: */#define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes *//* bits 0x1800 are available */#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols * modified, or tuple deleted */#define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */#define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits *///把十六进制值转换为二进制显示 11111111111 #define HEAP_NATTS_MASK 0x07FF 10000000000000 #define HEAP_KEYS_UPDATED 0x2000 100000000000000 #define HEAP_HOT_UPDATED 0x4000 1000000000000000 #define HEAP_ONLY_TUPLE 0x8000 1110000000000000 #define HEAP2_XACT_MASK 0xE000 1111111111111110 #define SpecTokenOffsetNumber 0xfffe//前(低)11位为属性的个数,3意味着有3个属性(字段)
xl_heap_freeze_tuple
xl_heap_freeze_tuple表示’freeze plan’,用于存储在vacuum期间冻结tuple所需要的信息.
/* * This struct represents a 'freeze plan', which is what we need to know about * a single tuple being frozen during vacuum. * 该结构表示'freeze plan',用于存储在vacuum期间冻结tuple所需要的信息 *//* 0x01 was XLH_FREEZE_XMIN */#define XLH_FREEZE_XVAC 0x02#define XLH_INVALID_XVAC 0x04typedef struct xl_heap_freeze_tuple{ TransactionId xmax; OffsetNumber offset; uint16 t_infomask2; uint16 t_infomask; uint8 frzflags;} xl_heap_freeze_tuple;
二、源码解读
heap_execute_freeze_tuple执行实际的元组冻结操作(先前已完成准备工作),逻辑很简单,设置xmax和冻结事务号.
/* * heap_execute_freeze_tuple * Execute the prepared freezing of a tuple. * 执行实际的元组冻结操作(先前已完成准备工作) * * Caller is responsible for ensuring that no other backend can access the * storage underlying this tuple, either by holding an exclusive lock on the * buffer containing it (which is what lazy VACUUM does), or by having it be * in private storage (which is what CLUSTER and friends do). * 调用者有责任确保没有其他后台进程可以访问该元组所在的存储空间, * 通过持有该元组所在的buffer独占锁(lazy VACUUM所做的事情), * 或者在私有存储空间中存储(CLUSTER和友元的处理方式) * * Note: it might seem we could make the changes without exclusive lock, since * TransactionId read/write is assumed atomic anyway. However there is a race * condition: someone who just fetched an old XID that we overwrite here could * conceivably not finish checking the XID against pg_xact before we finish * the VACUUM and perhaps truncate off the part of pg_xact he needs. Getting * exclusive lock ensures no other backend is in process of checking the * tuple status. Also, getting exclusive lock makes it safe to adjust the * infomask bits. * 注意:看起来我们可以不需要独占锁就可以进行修改,因为TransactionId R/W假定是原子操作. * 但是,这里有条件争用:某些进程刚刚提取了一个旧的XID,而该XID已被覆盖, * 这时候会出现在完成VACUUM之前还没有完成pg_xact之上的XID检查, * 并且可能会出现截断了pg_xact所需要的部分内容. * 获取独占锁可以确保没有其他后台进程正在检查元组状态. * 同时,获取独占锁可以安全的调整infomask标记位. * * NB: All code in here must be safe to execute during crash recovery! * 注意:这里的所有代码必须在崩溃恢复期间可以安全的执行. */voidheap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz){ HeapTupleHeaderSetXmax(tuple, frz->xmax); if (frz->frzflags & XLH_FREEZE_XVAC) HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); if (frz->frzflags & XLH_INVALID_XVAC) HeapTupleHeaderSetXvac(tuple, InvalidTransactionId); tuple->t_infomask = frz->t_infomask; tuple->t_infomask2 = frz->t_infomask2;}//设置元组的xmax值#define HeapTupleHeaderSetXmax(tup, xid) \( \ (tup)->t_choice.t_heap.t_xmax = (xid) \)//设置#define HeapTupleHeaderSetXvac(tup, xid) \do { \ Assert((tup)->t_infomask & HEAP_MOVED); \ (tup)->t_choice.t_heap.t_field3.t_xvac = (xid); \} while (0)
三、跟踪分析
N/A
四、参考资料PG Source Code
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。