本节简单介绍了PostgreSQL缓存管理(Buffer Manager)中的实现函数ReadBuffer_common->BufferAlloc->BufTableHashCode,该函数根据BufferTag计算Hash Code。

一、数据结构

BufferDesc
共享缓冲区的共享描述符(状态)数据

/* * Flags for buffer descriptors * buffer描述器标记 * * Note: TAG_VALID essentially means that there is a buffer hashtable * entry associated with the buffer's tag. * 注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。 *///buffer header锁定#define BM_LOCKED (1U << 22) /* buffer header is locked *///数据需要写入(标记为DIRTY)#define BM_DIRTY (1U << 23) /* data needs writing *///数据是有效的#define BM_VALID (1U << 24) /* data is valid *///已分配buffer tag#define BM_TAG_VALID (1U << 25) /* tag is assigned *///正在R/W#define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress *///上一个I/O出现错误#define BM_IO_ERROR (1U << 27) /* previous I/O failed *///开始写则变DIRTY#define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started *///存在等待sole pin的其他进程#define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin *///checkpoint发生,必须刷到磁盘上#define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint *///持久化buffer(不是unlogged或者初始化fork)#define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged, * or init fork) *//* * BufferDesc -- shared descriptor/state data for a single shared buffer. * BufferDesc -- 共享缓冲区的共享描述符(状态)数据 * * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change * the tag, state or wait_backend_pid fields. In general, buffer header lock * is a spinlock which is combined with flags, refcount and usagecount into * single atomic variable. This layout allow us to do some operations in a * single atomic operation, without actually acquiring and releasing spinlock; * for instance, increase or decrease refcount. buf_id field never changes * after initialization, so does not need locking. freeNext is protected by * the buffer_strategy_lock not buffer header lock. The LWLock can take care * of itself. The buffer header lock is *not* used to control access to the * data in the buffer! * 注意:必须持有Buffer header锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段. * 通常来说,buffer header lock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中. * 这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数). * buf_id字段在初始化后不会出现变化,因此不需要锁定. * freeNext通过buffer_strategy_lock锁而不是buffer header lock保护. * LWLock可以很好的处理自己的状态. * 务请注意的是:buffer header lock不用于控制buffer中的数据访问! * * It's assumed that nobody changes the state field while buffer header lock * is held. Thus buffer header lock holder can do complex updates of the * state variable in single write, simultaneously with lock release (cleaning * BM_LOCKED flag). On the other hand, updating of state without holding * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag * is not set. Atomic increment/decrement, OR/AND etc. are not allowed. * 假定在持有buffer header lock的情况下,没有人改变状态字段. * 持有buffer header lock的进程可以执行在单个写操作中执行复杂的状态变量更新, * 同步的释放锁(清除BM_LOCKED标记). * 换句话说,如果没有持有buffer header lock的状态更新,会受限于CAS, * 这种情况下确保BM_LOCKED没有被设置. * 比如原子的增加/减少(AND/OR)等操作是不允许的. * * An exception is that if we have the buffer pinned, its tag can't change * underneath us, so we can examine the tag without locking the buffer header. * Also, in places we do one-time reads of the flags without bothering to * lock the buffer header; this is generally for situations where we don't * expect the flag bit being tested to be changing. * 一种例外情况是如果我们已有buffer pinned,该buffer的tag不能改变(在本进程之下), * 因此不需要锁定buffer header就可以检查tag了. * 同时,在执行一次性的flags读取时不需要锁定buffer header. * 这种情况通常用于我们不希望正在测试的flag bit将被改变. * * We can't physically remove items from a disk page if another backend has * the buffer pinned. Hence, a backend may need to wait for all other pins * to go away. This is signaled by storing its own PID into * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER. At present, * there can be only one such waiter per buffer. * 如果其他进程有buffer pinned,那么进程不能物理的从磁盘页面中删除items. * 因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中, * 并设置标记位BM_PIN_COUNT_WAITER. * 目前,每个缓冲区只能由一个等待进程. * * We use this same struct for local buffer headers, but the locks are not * used and not all of the flag bits are useful either. To avoid unnecessary * overhead, manipulations of the state field should be done without actual * atomic operations (i.e. only pg_atomic_read_u32() and * pg_atomic_unlocked_write_u32()). * 本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用. * 为了避免不必要的负载,状态域的维护不需要实际的原子操作 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32()) * * Be careful to avoid increasing the size of the struct when adding or * reordering members. Keeping it below 64 bytes (the most common CPU * cache line size) is fairly important for performance. * 在增加或者记录成员变量时,小心避免增加结构体的大小. * 保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的. */typedef struct BufferDesc{ //buffer tag BufferTag tag; /* ID of page contained in buffer */ //buffer索引编号(0开始),指向相应的buffer pool slot int buf_id; /* buffer's index number (from 0) */ /* state of the tag, containing flags, refcount and usagecount */ //tag状态,包括flags/refcount和usagecount pg_atomic_uint32 state; //pin-count等待进程ID int wait_backend_pid; /* backend PID of pin-count waiter */ //空闲链表链中下一个空闲的buffer int freeNext; /* link in freelist chain */ //缓冲区内容锁 LWLock content_lock; /* to lock access to buffer contents */} BufferDesc;

BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block

/* * Buffer tag identifies which disk block the buffer contains. * Buffer tag标记了buffer存储的是磁盘中哪个block * * Note: the BufferTag data must be sufficient to determine where to write the * block, without reference to pg_class or pg_tablespace entries. It's * possible that the backend flushing the buffer doesn't even believe the * relation is visible yet (its xact may have started before the xact that * created the rel). The storage manager must be able to cope anyway. * 注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息. * 有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前). * 存储管理器必须可以处理这些事情. * * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have * to be fixed to zero them, since this struct is used as a hash key. * 注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零,因为这个结构体用作散列键. */typedef struct buftag{ //物理relation标识符 RelFileNode rnode; /* physical relation identifier */ ForkNumber forkNum; //相对于relation起始的块号 BlockNumber blockNum; /* blknum relative to begin of reln */} BufferTag;

HTAB
哈希表的顶层控制结构.

/* * Top control structure for a hashtable --- in a shared table, each backend * has its own copy (OK since no fields change at runtime) * 哈希表的顶层控制结构. * 在这个共享哈希表中,每一个后台进程都有自己的拷贝 * (之所以没有问题是因为fork出来后,在运行期没有字段会变化) */struct HTAB{ //指向共享的控制信息 HASHHDR *hctl; /* => shared control information */ //段开始目录 HASHSEGMENT *dir; /* directory of segment starts */ //哈希函数 HashValueFunc hash; /* hash function */ //哈希键比较函数 HashCompareFunc match; /* key comparison function */ //哈希键拷贝函数 HashCopyFunc keycopy; /* key copying function */ //内存分配器 HashAllocFunc alloc; /* memory allocator */ //内存上下文 MemoryContext hcxt; /* memory context if default allocator used */ //表名(用于错误信息) char *tabname; /* table name (for error messages) */ //如在共享内存中,则为T bool isshared; /* true if table is in shared memory */ //如为T,则固定大小不能扩展 bool isfixed; /* if true, don't enlarge */ /* freezing a shared table isn't allowed, so we can keep state here */ //不允许冻结共享表,因此这里会保存相关状态 bool frozen; /* true = no more inserts allowed */ /* We keep local copies of these fixed values to reduce contention */ //保存这些固定值的本地拷贝,以减少冲突 //哈希键长度(以字节为单位) Size keysize; /* hash key length in bytes */ //段大小,必须为2的幂 long ssize; /* segment size --- must be power of 2 */ //段偏移,ssize的对数 int sshift; /* segment shift = log2(ssize) */};/* * Header structure for a hash table --- contains all changeable info * 哈希表的头部结构 -- 存储所有可变信息 * * In a shared-memory hash table, the HASHHDR is in shared memory, while * each backend has a local HTAB struct. For a non-shared table, there isn't * any functional difference between HASHHDR and HTAB, but we separate them * anyway to share code between shared and non-shared tables. * 在共享内存哈希表中,HASHHDR位于共享内存中,每一个后台进程都有一个本地HTAB结构. * 对于非共享哈希表,HASHHDR和HTAB没有任何功能性的不同, * 但无论如何,我们还是把它们区分为共享和非共享表. */struct HASHHDR{ /* * The freelist can become a point of contention in high-concurrency hash * tables, so we use an array of freelists, each with its own mutex and * nentries count, instead of just a single one. Although the freelists * normally operate independently, we will scavenge entries from freelists * other than a hashcode's default freelist when necessary. * 在高并发的哈希表中,空闲链表会成为竞争热点,因此我们使用空闲链表数组, * 数组中的每一个元素都有自己的mutex和条目统计,而不是使用一个. * * If the hash table is not partitioned, only freeList[0] is used and its * spinlock is not used at all; callers' locking is assumed sufficient. * 如果哈希表没有分区,那么只有freelist[0]元素是有用的,自旋锁没有任何用处; * 调用者锁定被认为已足够OK. */ FreeListData freeList[NUM_FREELISTS]; /* These fields can change, but not in a partitioned table */ //这些域字段可以改变,但不适用于分区表 /* Also, dsize can't change in a shared table, even if unpartitioned */ //同时,就算是非分区表,共享表的dsize也不能改变 //目录大小 long dsize; /* directory size */ //已分配的段大小(<= dbsize) long nsegs; /* number of allocated segments (<= dsize) */ //正在使用的最大桶ID uint32 max_bucket; /* ID of maximum bucket in use */ //进入整个哈希表的模掩码 uint32 high_mask; /* mask to modulo into entire table */ //进入低于半个哈希表的模掩码 uint32 low_mask; /* mask to modulo into lower half of table */ /* These fields are fixed at hashtable creation */ //下面这些字段在哈希表创建时已固定 //哈希键大小(以字节为单位) Size keysize; /* hash key length in bytes */ //所有用户元素大小(以字节为单位) Size entrysize; /* total user element size in bytes */ //分区个数(2的幂),或者为0 long num_partitions; /* # partitions (must be power of 2), or 0 */ //目标的填充因子 long ffactor; /* target fill factor */ //如目录是固定大小,则该值为dsize的上限值 long max_dsize; /* 'dsize' limit if directory is fixed size */ //段大小,必须是2的幂 long ssize; /* segment size --- must be power of 2 */ //端偏移,ssize的对数 int sshift; /* segment shift = log2(ssize) */ //一次性分配的条目个数 int nelem_alloc; /* number of entries to allocate at once */#ifdef HASH_STATISTICS /* * Count statistics here. NB: stats code doesn't bother with mutex, so * counts could be corrupted a bit in a partitioned table. * 统计信息. * 注意:统计相关的代码不会影响mutex,因此对于分区表,统计可能有一点点问题 */ long accesses; long collisions;#endif};/* * HASHELEMENT is the private part of a hashtable entry. The caller's data * follows the HASHELEMENT structure (on a MAXALIGN'd boundary). The hash key * is expected to be at the start of the caller's hash entry data structure. * HASHELEMENT是哈希表条目的私有部分. * 调用者的数据按照HASHELEMENT结构组织(位于MAXALIGN的边界). * 哈希键应位于调用者hash条目数据结构的开始位置. */typedef struct HASHELEMENT{ //链接到相同桶中的下一个条目 struct HASHELEMENT *link; /* link to next entry in same bucket */ //该条目的哈希函数结果 uint32 hashvalue; /* hash function result for this entry */} HASHELEMENT;/* Hash table header struct is an opaque type known only within dynahash.c *///哈希表头部结构,非透明类型,用于dynahash.ctypedef struct HASHHDR HASHHDR;/* Hash table control struct is an opaque type known only within dynahash.c *///哈希表控制结构,非透明类型,用于dynahash.ctypedef struct HTAB HTAB;/* Parameter data structure for hash_create *///hash_create使用的参数数据结构/* Only those fields indicated by hash_flags need be set *///根据hash_flags标记设置相应的字段typedef struct HASHCTL{ //分区个数(必须是2的幂) long num_partitions; /* # partitions (must be power of 2) */ //段大小 long ssize; /* segment size */ //初始化目录大小 long dsize; /* (initial) directory size */ //dsize上限 long max_dsize; /* limit to dsize if dir size is limited */ //填充因子 long ffactor; /* fill factor */ //哈希键大小(字节为单位) Size keysize; /* hash key length in bytes */ //参见上述数据结构注释 Size entrysize; /* total user element size in bytes */ // HashValueFunc hash; /* hash function */ HashCompareFunc match; /* key comparison function */ HashCopyFunc keycopy; /* key copying function */ HashAllocFunc alloc; /* memory allocator */ MemoryContext hcxt; /* memory context to use for allocations */ //共享内存中的哈希头部结构地址 HASHHDR *hctl; /* location of header in shared mem */} HASHCTL;/* A hash bucket is a linked list of HASHELEMENTs *///哈希桶是HASHELEMENTs链表typedef HASHELEMENT *HASHBUCKET;/* A hash segment is an array of bucket headers *///hash segment是桶数组typedef HASHBUCKET *HASHSEGMENT;/* * Hash functions must have this signature. * Hash函数必须有它自己的标识 */typedef uint32 (*HashValueFunc) (const void *key, Size keysize); /* * Key comparison functions must have this signature. Comparison functions * return zero for match, nonzero for no match. (The comparison function * definition is designed to allow memcmp() and strncmp() to be used directly * as key comparison functions.) * 哈希键对比函数必须有自己的标识. * 如匹配则对比函数返回0,不匹配返回非0. * (对比函数定义被设计为允许在对比键值时可直接使用memcmp()和strncmp()) */typedef int (*HashCompareFunc) (const void *key1, const void *key2, Size keysize); /* * Key copying functions must have this signature. The return value is not * used. (The definition is set up to allow memcpy() and strlcpy() to be * used directly.) * 键拷贝函数必须有自己的标识. * 返回值无用. */typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);/* * Space allocation function for a hashtable --- designed to match malloc(). * Note: there is no free function API; can't destroy a hashtable unless you * use the default allocator. * 哈希表的恐惧分配函数 -- 被设计为与malloc()函数匹配. * 注意:这里没有释放函数API;不能销毁哈希表,除非使用默认的分配器. */typedef void *(*HashAllocFunc) (Size request);

FreeListData
在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义.
nentries跟踪有这些hashcodes的仍存活的hashtable条目个数.

/* * Per-freelist data. * 空闲链表数据. * * In a partitioned hash table, each freelist is associated with a specific * set of hashcodes, as determined by the FREELIST_IDX() macro below. * nentries tracks the number of live hashtable entries having those hashcodes * (NOT the number of entries in the freelist, as you might expect). * 在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义. * nentries跟踪有这些hashcodes的仍存活的hashtable条目个数. * (注意不要搞错,不是空闲的条目个数) * * The coverage of a freelist might be more or less than one partition, so it * needs its own lock rather than relying on caller locking. Relying on that * wouldn't work even if the coverage was the same, because of the occasional * need to "borrow" entries from another freelist; see get_hash_entry(). * 空闲链表的覆盖范围可能比一个分区多或少,因此需要自己的锁而不能仅仅依赖调用者的锁. * 依赖调用者锁在覆盖面一样的情况下也不会起效,因为偶尔需要从另一个自由列表“借用”条目,详细参见get_hash_entry() * * Using an array of FreeListData instead of separate arrays of mutexes, * nentries and freeLists helps to reduce sharing of cache lines between * different mutexes. * 使用FreeListData数组而不是一个独立的mutexes,nentries和freelists数组有助于减少不同mutexes之间的缓存线共享. */typedef struct{ //该空闲链表的自旋锁 slock_t mutex; /* spinlock for this freelist */ //相关桶中的条目个数 long nentries; /* number of entries in associated buckets */ //空闲元素链 HASHELEMENT *freeList; /* chain of free elements */} FreeListData;

BufferLookupEnt

/* entry for buffer lookup hashtable *///检索hash表的条目typedef struct{ //磁盘page的tag BufferTag key; /* Tag of a disk page */ //相关联的buffer ID int id; /* Associated buffer ID */} BufferLookupEnt;二、源码解读

BufTableHashCode
BufTableHashCode函数根据BufferTag计算Hash Code,主要的逻辑在函数hash_any中实现

/* * BufTableHashCode * Compute the hash code associated with a BufferTag * 根据BufferTag计算Hash Code * * This must be passed to the lookup/insert/delete routines along with the * tag. We do it like this because the callers need to know the hash code * in order to determine which buffer partition to lock, and we don't want * to do the hash computation twice (hash_any is a bit slow). * 该函数的返回值需要作为参数传递给与tag相关的检索/插入/删除处理过程. * 之所以这样处理是因为调用者需要知道Hash Code以便确定那个buffer partition需要锁定, * 而且我们不希望多次计算hash(hash_any有一点点慢). */uint32BufTableHashCode(BufferTag *tagPtr){ return get_hash_value(SharedBufHash, (void *) tagPtr);}/* * get_hash_value -- exported routine to calculate a key's hash value * get_hash_value -- exported过程,用于计算key的hash值 * * We export this because for partitioned tables, callers need to compute * the partition number (from the low-order bits of the hash value) before * searching. * 之所以export这个过程是因为分区表,调用者需要在搜索前计算分区编号(根据hash值的lower-order bits) */uint32get_hash_value(HTAB *hashp, const void *keyPtr){ return hashp->hash(keyPtr, hashp->keysize);} /* * tag_hash: hash function for fixed-size tag values * tag_hash:固定tag大小的hash函数 */uint32tag_hash(const void *key, Size keysize){ return DatumGetUInt32(hash_any((const unsigned char *) key, (int) keysize));} /* * DatumGetUInt32 * Returns 32-bit unsigned integer value of a datum. * DatumGetUInt32返回datum的32位无符号整型值 */#define DatumGetUInt32(X) ((uint32) (X))

hash_any
hash_any函数hash一个可变长键值到一个32位值

/* * hash_any() -- hash a variable-length key into a 32-bit value * k : the key (the unaligned variable-length array of bytes) * len : the length of the key, counting by bytes * hash_any() -- hash一个可变长键值到一个32位值. * k : the key(未对齐的可变长字节数组) * * Returns a uint32 value. Every bit of the key affects every bit of * the return value. Every 1-bit and 2-bit delta achieves avalanche. * About 6*len+35 instructions. The best hash table sizes are powers * of 2. There is no need to do mod a prime (mod is sooo slow!). * If you need less than 32 bits, use a bitmask. * 返回无符号32位整型值. * key的每一位都会影响返回值的每一位.每一个1位和2位增量都会产生雪崩. * 大概有6*len+35个指令.最好的hash表大小是2的幂.不需要进行很慢的mod操作. * 如果需要少于32bits的值,那使用bitmask. * * This procedure must never throw elog(ERROR); the ResourceOwner code * relies on this not to fail. * 这个过程永远都不要抛出elog(ERROR);依赖此函数的ResourceOwner代码永远都不会出现异常. * * Note: we could easily change this function to return a 64-bit hash value * by using the final values of both b and c. b is perhaps a little less * well mixed than c, however. * 注意:不能轻易的改变该函数,通过使用b和c的最后值来返回64-bit的hash值.b的混合度可能没有c好 * */Datumhash_any(register const unsigned char *k, register int keylen){ register uint32 a, b, c, len; /* Set up the internal state */ //设置内部状态,初始化a/b/c len = keylen; a = b = c = 0x9e3779b9 + len + 3923095; /* If the source pointer is word-aligned, we use word-wide fetches */ //如果源指针是字对齐的,那么我们使用字宽提取 if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0) { //源数据是对齐的 /* Code path for aligned source data */ register const uint32 *ka = (const uint32 *) k; /* handle most of the key */ while (len >= 12) { a += ka[0]; b += ka[1]; c += ka[2]; mix(a, b, c); ka += 3; len -= 12; } /* handle the last 11 bytes */ //处理后面11个字节 k = (const unsigned char *) ka;#ifdef WORDS_BIGENDIAN//大码端 switch (len) { case 11: c += ((uint32) k[10] << 8); /* fall through */ case 10: c += ((uint32) k[9] << 16); /* fall through */ case 9: c += ((uint32) k[8] << 24); /* fall through */ case 8: /* the lowest byte of c is reserved for the length */ b += ka[1]; a += ka[0]; break; case 7: b += ((uint32) k[6] << 8); /* fall through */ case 6: b += ((uint32) k[5] << 16); /* fall through */ case 5: b += ((uint32) k[4] << 24); /* fall through */ case 4: a += ka[0]; break; case 3: a += ((uint32) k[2] << 8); /* fall through */ case 2: a += ((uint32) k[1] << 16); /* fall through */ case 1: a += ((uint32) k[0] << 24); /* case 0: nothing left to add */ }#else /* 小码端; !WORDS_BIGENDIAN */ switch (len) { case 11: c += ((uint32) k[10] << 24); /* fall through */ case 10: c += ((uint32) k[9] << 16); /* fall through */ case 9: c += ((uint32) k[8] << 8); /* fall through */ case 8: /* the lowest byte of c is reserved for the length */ b += ka[1]; a += ka[0]; break; case 7: b += ((uint32) k[6] << 16); /* fall through */ case 6: b += ((uint32) k[5] << 8); /* fall through */ case 5: b += k[4]; /* fall through */ case 4: a += ka[0]; break; case 3: a += ((uint32) k[2] << 16); /* fall through */ case 2: a += ((uint32) k[1] << 8); /* fall through */ case 1: a += k[0]; /* case 0: nothing left to add */ }#endif /* WORDS_BIGENDIAN */ } else//---------- 非字对齐 { /* Code path for non-aligned source data */ /* handle most of the key */ while (len >= 12) {#ifdef WORDS_BIGENDIAN a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24)); b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24)); c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));#else /* !WORDS_BIGENDIAN */ a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24)); b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24)); c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));#endif /* WORDS_BIGENDIAN */ mix(a, b, c); k += 12; len -= 12; } /* handle the last 11 bytes */#ifdef WORDS_BIGENDIAN switch (len) { case 11: c += ((uint32) k[10] << 8); /* fall through */ case 10: c += ((uint32) k[9] << 16); /* fall through */ case 9: c += ((uint32) k[8] << 24); /* fall through */ case 8: /* the lowest byte of c is reserved for the length */ b += k[7]; /* fall through */ case 7: b += ((uint32) k[6] << 8); /* fall through */ case 6: b += ((uint32) k[5] << 16); /* fall through */ case 5: b += ((uint32) k[4] << 24); /* fall through */ case 4: a += k[3]; /* fall through */ case 3: a += ((uint32) k[2] << 8); /* fall through */ case 2: a += ((uint32) k[1] << 16); /* fall through */ case 1: a += ((uint32) k[0] << 24); /* case 0: nothing left to add */ }#else /* !WORDS_BIGENDIAN */ switch (len) { case 11: c += ((uint32) k[10] << 24); /* fall through */ case 10: c += ((uint32) k[9] << 16); /* fall through */ case 9: c += ((uint32) k[8] << 8); /* fall through */ case 8: /* the lowest byte of c is reserved for the length */ b += ((uint32) k[7] << 24); /* fall through */ case 7: b += ((uint32) k[6] << 16); /* fall through */ case 6: b += ((uint32) k[5] << 8); /* fall through */ case 5: b += k[4]; /* fall through */ case 4: a += ((uint32) k[3] << 24); /* fall through */ case 3: a += ((uint32) k[2] << 16); /* fall through */ case 2: a += ((uint32) k[1] << 8); /* fall through */ case 1: a += k[0]; /* case 0: nothing left to add */ }#endif /* WORDS_BIGENDIAN */ } final(a, b, c); /* report the result */ return UInt32GetDatum(c);}/*---------- * mix -- mix 3 32-bit values reversibly. * * This is reversible, so any information in (a,b,c) before mix() is * still in (a,b,c) after mix(). * * If four pairs of (a,b,c) inputs are run through mix(), or through * mix() in reverse, there are at least 32 bits of the output that * are sometimes the same for one pair and different for another pair. * This was tested for: * * pairs that differed by one bit, by two bits, in any combination * of top bits of (a,b,c), or in any combination of bottom bits of * (a,b,c). * * "differ" is defined as +, -, ^, or ~^. For + and -, I transformed * the output delta to a Gray code (a^(a>>1)) so a string of 1's (as * is commonly produced by subtraction) look like a single 1-bit * difference. * * the base values were pseudorandom, all zero but one bit set, or * all zero plus a counter that starts at zero. * * This does not achieve avalanche. There are input bits of (a,b,c) * that fail to affect some output bits of (a,b,c), especially of a. The * most thoroughly mixed value is c, but it doesn't really even achieve * avalanche in c. * * This allows some parallelism. Read-after-writes are good at doubling * the number of bits affected, so the goal of mixing pulls in the opposite * direction from the goal of parallelism. I did what I could. Rotates * seem to cost as much as shifts on every machine I could lay my hands on, * and rotates are much kinder to the top and bottom bits, so I used rotates. *---------- */#define mix(a,b,c) \{ \ a -= c; a ^= rot(c, 4); c += b; \ b -= a; b ^= rot(a, 6); a += c; \ c -= b; c ^= rot(b, 8); b += a; \ a -= c; a ^= rot(c,16); c += b; \ b -= a; b ^= rot(a,19); a += c; \ c -= b; c ^= rot(b, 4); b += a; \}/* * UInt32GetDatum * Returns datum representation for a 32-bit unsigned integer. */#define UInt32GetDatum(X) ((Datum) (X))三、跟踪分析

N/A

四、参考资料

PG Source Code
Buffer Manager