PostgreSQL 源码解读（214）- 后台进程#13（checkpointer-IsCheckpointOnSchedule）

2025-04-14 技术教程

本节介绍了checkpoint中用于控制checkpoint刷盘频率的函数:IsCheckpointOnSchedule.

一、数据结构

宏定义
checkpoints request flag bits
checkpoints request flag bits,检查点请求标记位定义.

/* * OR-able request flag bits for checkpoints. The "cause" bits are used only * for logging purposes. Note: the flags must be defined so that it's * sensible to OR together request flags arising from different requestors. *//* These directly affect the behavior of CreateCheckPoint and subsidiaries */#define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but * issued at end of WAL recovery */#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those * belonging to unlogged tables *//* These are important to RequestCheckpoint */#define CHECKPOINT_WAIT 0x0020 /* Wait for completion */#define CHECKPOINT_REQUESTED 0x0040 /* Checkpoint request has been made *//* These indicate the cause of a checkpoint request */#define CHECKPOINT_CAUSE_XLOG 0x0080 /* XLOG consumption */#define CHECKPOINT_CAUSE_TIME 0x0100 /* Elapsed time */

WRITES_PER_ABSORB

/* interval for calling AbsorbSyncRequests in CheckpointWriteDelay *///调用AbsorbSyncRequests的间隔,默认值为1000#define WRITES_PER_ABSORB 1000二、源码解读

IsCheckpointOnSchedule
该函数判断是否在完成checkpoint的调度中,如返回T则可以休息,否则返回F则需要干活.

/* * Calculate CheckPointSegments based on max_wal_size_mb and * checkpoint_completion_target. * 计算CheckPointSegments */static voidCalculateCheckpointSegments(void){ double target; /*------- * Calculate the distance at which to trigger a checkpoint, to avoid * exceeding max_wal_size_mb. This is based on two assumptions: * * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept * WAL for two checkpoint cycles to allow us to recover from the * secondary checkpoint if the first checkpoint failed, though we * only did this on the master anyway, not on standby. Keeping just * one checkpoint simplifies processing and reduces disk space in * many smaller databases.) * b) during checkpoint, we consume checkpoint_completion_target * * number of segments consumed between checkpoints. *------- */ //#define ConvertToXSegs(x,segsize) (x / ((segsize) / (1024 * 1024))) target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) / (1.0 + CheckPointCompletionTarget); /* round down */ CheckPointSegments = (int) target; if (CheckPointSegments < 1) CheckPointSegments = 1;}/* * IsCheckpointOnSchedule -- are we on schedule to finish this checkpoint * (or restartpoint) in time? * IsCheckpointOnSchedule -- 是否在完成checkpoint的调度中 * * Compares the current progress against the time/segments elapsed since last * checkpoint, and returns true if the progress we've made this far is greater * than the elapsed time/segments. * 当前的进度与消逝的time/xlog segments进行比较,如果进度要早,那么返回T(进入休息状态) */static boolIsCheckpointOnSchedule(double progress){ XLogRecPtr recptr; struct timeval now; double elapsed_xlogs, elapsed_time; Assert(ckpt_active); /* Scale progress according to checkpoint_completion_target. */ //实际进度调整为progress*checkpoint_completion_target progress *= CheckPointCompletionTarget; /* * Check against the cached value first. Only do the more expensive * calculations once we reach the target previously calculated. Since * neither time or WAL insert pointer moves backwards, a freshly * calculated value can only be greater than or equal to the cached value. * 如果进度小于缓存值,返回F,需加快进度了! */ if (progress < ckpt_cached_elapsed) return false; /* * Check progress against WAL segments written and CheckPointSegments. * 进度 vs WAL * * We compare the current WAL insert location against the location * computed before calling CreateCheckPoint. The code in XLogInsert that * actually triggers a checkpoint when CheckPointSegments is exceeded * compares against RedoRecptr, so this is not completely accurate. * However, it's good enough for our purposes, we're only calculating an * estimate anyway. * * During recovery, we compare last replayed WAL record's location with * the location computed before calling CreateRestartPoint. That maintains * the same pacing as we have during checkpoints in normal operation, but * we might exceed max_wal_size by a fair amount. That's because there can * be a large gap between a checkpoint's redo-pointer and the checkpoint * record itself, and we only start the restartpoint after we've seen the * checkpoint record. (The gap is typically up to CheckPointSegments * * checkpoint_completion_target where checkpoint_completion_target is the * value that was in effect when the WAL was generated). */ if (RecoveryInProgress()) recptr = GetXLogReplayRecPtr(NULL); else recptr = GetInsertRecPtr(); elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) / wal_segment_size) / CheckPointSegments; if (progress < elapsed_xlogs) { //进度小于产生xlogs的速度,需干活 ckpt_cached_elapsed = elapsed_xlogs; return false; } /* * Check progress against time elapsed and checkpoint_timeout. * 比较时间 */ gettimeofday(&now, NULL); elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) + now.tv_usec / 1000000.0) / CheckPointTimeout; if (progress < elapsed_time) { //进度慢于消逝的时间,需干活 ckpt_cached_elapsed = elapsed_time; return false; } /* It looks like we're on schedule. */ //处于调度中,可以休息 return true;}三、跟踪分析

N/A

四、参考资料

PG Source Code
PgSQL · 特性分析 · 谈谈checkpoint的调度