设为首页 收藏本站
查看: 1285|回复: 0

[经验分享] PostgreSQL启动过程中的那些事七:初始化共享内存和信号二:shmem中初始化xlog

[复制链接]

尚未签到

发表于 2016-11-21 07:46:46 | 显示全部楼层 |阅读模式
        pg初始化完shmem,给其加上索引"ShmemIndex"后,接着就在shmem里初始化xlog。
1先上个图,看一下函数调用过程梗概,中间略过部分细节

DSC0000.bmp
初始化xlog方法调用流程图

  
 

  
2初始化xlog相关结构
  
话说main()->…->PostmasterMain()->…->reset_shared() ->CreateSharedMemoryAndSemaphores()>…->XLOGSHmemInit(),初始化控制文件data/global/pg_control相关数据结构及事务日志xlog相关数据结构,相关结构定义在下面。
  
 
  
typedef struct ControlFileData
  
{
  
       /*
  
        * Unique system identifier --- to ensure wematch up xlog files with the
  
        * installation that produced them.
  
        */
  
       uint64           system_identifier;
  
 
  
       /*
  
        * Version identifier information.   Keep these fields at the same offset,
  
        * especially pg_control_version; they won't bereal useful if they move
  
        * around.   (Forhistorical reasons they must be 8 bytes into the file
  
        * rather than immediately at the front.)
  
        *
  
        * pg_control_version identifies the format ofpg_control itself.
  
        * catalog_version_no identifies the format ofthe system catalogs.
  
        *
  
        * There are additional version identifiers inindividual files; for
  
        * example, WAL logs contain per-page magic numbersthat can serve as
  
        * version cues for the WAL log.
  
        */
  
       uint32           pg_control_version;         /* PG_CONTROL_VERSION */
  
       uint32           catalog_version_no;        /* see catversion.h */
  
 
  
       /*
  
        * System status data
  
        */
  
       DBState        state;                   /*see enum above */
  
       pg_time_t    time;                    /*time stamp of last pg_control update */
  
       XLogRecPtr  checkPoint;        /*last check point record ptr */
  
       XLogRecPtr  prevCheckPoint; /* previous check point recordptr */
  
 
  
       CheckPoint checkPointCopy; /* copy of last check pointrecord */
  
 
  
       /*
  
        * These two values determine the minimum pointwe must recover up to
  
        * before starting up:
  
        *
  
        * minRecoveryPoint is updated to the latestreplayed LSN whenever we
  
        * flush a data change during archive recovery.That guards against
  
        * starting archive recovery, aborting it, andrestarting with an earlier
  
        * stop location. If we've already flushed datachanges from WAL record X
  
        * to disk, we mustn't start up until we reachX again. Zero when not
  
        * doing archive recovery.
  
        *
  
        * backupStartPoint is the redo pointer of thebackup start checkpoint, if
  
        * we are recovering from an online backup andhaven't reached the end of
  
        * backup yet. It is reset to zero when the endof backup is reached, and
  
        * we mustn't start up before that. A booleanwould suffice otherwise, but
  
        * we use the redo pointer as a cross-checkwhen we see an end-of-backup
  
        * record, to make sure the end-of-backuprecord corresponds the base
  
        * backup we're recovering from.
  
        */
  
       XLogRecPtr  minRecoveryPoint;
  
       XLogRecPtr  backupStartPoint;
  
 
  
       /*
  
        * Parameter settings that determine if the WALcan be used for archival
  
        * or hot standby.
  
        */
  
       int                 wal_level;
  
       int                 MaxConnections;
  
       int                 max_prepared_xacts;
  
       int                 max_locks_per_xact;
  
 
  
       /*
  
        * This data is used to check for hardware-architecturecompatibility of
  
        * the database and the backendexecutable.  We need not check endianness
  
        * explicitly, since the pg_control versionwill surely look wrong to a
  
        * machine of different endianness, but we doneed to worry about MAXALIGN
  
        * and floating-point format.  (Note: storage layout nominally also
  
        * depends on SHORTALIGN and INTALIGN, but inpractice these are the same
  
        * on all architectures of interest.)
  
        *
  
        * Testing just one double value is not a verybulletproof test for
  
        * floating-point compatibility, but it willcatch most cases.
  
        */
  
       uint32           maxAlign;           /* alignment requirement for tuples */
  
       double         floatFormat;       /* constant 1234567.0 */
  
#define FLOATFORMAT_VALUE      1234567.0
  
 
  
       /*
  
        * This data is used to make sure that configurationof this database is
  
        * compatible with the backend executable.
  
        */
  
       uint32           blcksz;                 /* data block size for this DB */
  
       uint32           relseg_size;   /* blocks per segment of large relation */
  
 
  
       uint32           xlog_blcksz; /* block size within WAL files */
  
       uint32           xlog_seg_size;     /* size of each WAL segment */
  
 
  
       uint32           nameDataLen;  /* catalog name field width */
  
       uint32           indexMaxKeys;   /* max number of columns in an index */
  
 
  
       uint32           toast_max_chunk_size;   /* chunk size in TOAST tables */
  
 
  
       /*flag indicating internal format of timestamp, interval, time */
  
       bool             enableIntTimes; /* int64 storageenabled? */
  
 
  
       /*flags indicating pass-by-value status of various types */
  
       bool             float4ByVal; /* float4 pass-by-value? */
  
       bool             float8ByVal; /* float8, int8, etc pass-by-value? */
  
 
  
       /*CRC of all above ... MUST BE LAST! */
  
       pg_crc32     crc;
  
} ControlFileData;
  
 

  
/*
  
 * Bodyof CheckPoint XLOG records.  This isdeclared here because we keep
  
 * acopy of the latest one in pg_control for possible disaster recovery.
  
 *Changing this struct requires a PG_CONTROL_VERSION bump.
  
 */
  
typedef struct CheckPoint
  
{
  
       XLogRecPtr  redo;                   /*next RecPtr available when we began to
  
                                                         * create CheckPoint (i.e. REDO start point) */
  
       TimeLineID    ThisTimeLineID; /* current TLI */
  
       uint32           nextXidEpoch;   /* higher-order bits of nextXid */
  
       TransactionIdnextXid;           /* next free XID */
  
       Oid               nextOid;             /* next free OID */
  
       MultiXactIdnextMulti;            /* next freeMultiXactId */
  
       MultiXactOffsetnextMultiOffset;  /* next free MultiXactoffset */
  
       TransactionIdoldestXid;  /* cluster-wide minimumdatfrozenxid */
  
       Oid               oldestXidDB;       /* database with minimum datfrozenxid */
  
       pg_time_t    time;                    /*time stamp of checkpoint */
  
 
  
       /*
  
        * Oldest XID still running. This is onlyneeded to initialize hot standby
  
        * mode from an online checkpoint, so we onlybother calculating this for
  
        * online checkpoints and only when wal_levelis hot_standby. Otherwise
  
        * it's set to InvalidTransactionId.
  
        */
  
       TransactionIdoldestActiveXid;
  
} CheckPoint;
  
 
  
/*

  
 * Total shared-memorystate for XLOG.

  
 */

  
typedef struct XLogCtlData

  
{

  
    /* Protected byWALInsertLock: */

  
    XLogCtlInsertInsert;

  
 

  
    /* Protected byinfo_lck: */

  
    XLogwrtRqstLogwrtRqst;

  
    XLogwrtResultLogwrtResult;

  
    uint32      ckptXidEpoch;   /* nextXID & epoch of latest checkpoint */

  
    TransactionIdckptXid;

  
    XLogRecPtr  asyncXactLSN;   /*LSN of newest async commit/abort */

  
    uint32      lastRemovedLog; /* latest removed/recycledXLOG segment */

  
    uint32      lastRemovedSeg;

  
 

  
    /* Protected byWALWriteLock: */

  
    XLogCtlWrite Write;

  
 

  
    /*

  
     * These values do not change after startup,although the pointed-to pages

  
     * and xlblocks values certainly do.  Permission to read/write the pages

  
     * and xlblocks values depends on WALInsertLockand WALWriteLock.

  
     */

  
    char      *pages;          /* buffers forunwritten XLOG pages */

  
    XLogRecPtr*xlblocks;       /* 1st byte ptr-s +XLOG_BLCKSZ */

  
    int         XLogCacheBlck;  /* highest allocated xlog buffer index */

  
    TimeLineID  ThisTimeLineID;

  
    TimeLineID  RecoveryTargetTLI;

  
 

  
    /*

  
     * archiveCleanupCommand is read fromrecovery.conf but needs to be in

  
     * shared memory so that the bgwriter processcan access it.

  
     */

  
    char        archiveCleanupCommand[MAXPGPATH];

  
 

  
    /*

  
     * SharedRecoveryInProgress indicates if we'restill in crash or archive

  
     * recovery. Protected by info_lck.

  
     */

  
    bool        SharedRecoveryInProgress;

  
 

  
    /*

  
     * SharedHotStandbyActive indicates if we'restill in crash or archive

  
     * recovery. Protected by info_lck.

  
     */

  
    bool        SharedHotStandbyActive;

  
 

  
    /*

  
     * recoveryWakeupLatch is used to wake up thestartup process to continue

  
     * WAL replay, if it is waiting for WAL toarrive or failover trigger file

  
     * to appear.

  
     */

  
    Latch       recoveryWakeupLatch;

  
 

  
    /*

  
     * During recovery, we keep a copy of thelatest checkpoint record here.

  
     * Used by the background writer when it wantsto create a restartpoint.

  
     *

  
     * Protected by info_lck.

  
     */

  
    XLogRecPtr  lastCheckPointRecPtr;

  
    CheckPoint  lastCheckPoint;

  
 

  
    /* end+1 of the lastrecord replayed (or being replayed) */

  
    XLogRecPtr  replayEndRecPtr;

  
    /* end+1 of the lastrecord replayed */

  
    XLogRecPtr  recoveryLastRecPtr;

  
    /* timestamp of lastCOMMIT/ABORT record replayed (or being replayed) */

  
    TimestampTzrecoveryLastXTime;

  
    /* Are we requestedto pause recovery? */

  
    bool        recoveryPause;

  
 

  
    slock_t     info_lck;       /*locks shared variables shown above */

  
} XLogCtlData;

  
 

  
/*

  
 * Shared state datafor XLogInsert.

  
 */

  
typedef struct XLogCtlInsert

  
{

  
    XLogwrtResultLogwrtResult; /* a recent value of LogwrtResult */

  
    XLogRecPtr  PrevRecord;     /*start of previously-inserted record */

  
    int         curridx;        /* current block index in cache */

  
    XLogPageHeadercurrpage;    /* points to header of blockin cache */

  
    char      *currpos;        /* currentinsertion point in cache */

  
    XLogRecPtr  RedoRecPtr;     /*current redo point for insertions */

  
    bool        forcePageWrites;    /* forcing full-page writes for PITR? */

  
 

  
    /*

  
     * exclusiveBackup is true if a backup startedwith pg_start_backup() is

  
     * in progress, and nonExclusiveBackups is acounter indicating the number

  
     * of streaming base backups currently inprogress. forcePageWrites is set

  
     * to true when either of these is non-zero.lastBackupStart is the latest

  
     * checkpoint redo location used as a startingpoint for an online backup.

  
     */

  
    bool        exclusiveBackup;

  
    int         nonExclusiveBackups;

  
    XLogRecPtr  lastBackupStart;

  
} XLogCtlInsert;

  
 
  
在XLOGSHmemInit()函数里,首先在shmem的哈希表索引"ShmemIndex"上给控制文件pg_control增加一个HashElementShmemIndexEntentry),在shmem里根据ControlFileData大小调用ShmemAlloc()分配内存空间,使ShmemIndexEnt的成员location指向该空间,size成员记录该空间大小。
  
XLOGSHmemInit()调用ShmemInitStruct(),在其中调用hash_search()在哈希表索引"ShmemIndex"中查找"XLOGCtl",如果没有,就在shmemIndex中给"XLOG Ctl"分一个HashElement和ShmemIndexEntentry),在其中的Entry中写上"XLOG Ctl"。返回ShmemInitStruct(),再调用ShmemAlloc()在共享内存上给"XLOG Ctl"相关结构(见下面“XLog相关结构图”)分配空间,设置entry(在这儿及ShmemIndexEnt类型变量)的成员location指向该空间,size成员记录该空间大小,最后返回XLOGShmemInit(),让XLogCtlData *类型静态全局变量XLogCtl指向在shmem里给"XLOG Ctl"相关结构分配的内存地址,设置其中XLogCtlData结构类型的成员值。初始化完成后数据结构如下图。
  
 
DSC0001.bmp

初始化完xlog的内存结构图

  
       为了精简上图,把创建shmem的哈希表索引"ShmemIndex"时创建的HCTL结构删掉了,这个结构的作用是记录创建可扩展哈希表的相关信息。增加了左边灰色底的部分,描述共享内存/shmem里各变量物理布局概览,由下往上,由低地址到高地址。其中的"Control File"即ControlFileDate和"XLOG Ctl"即xlog的相关结构图下面分别给出,要不上面的图太大了。

DSC0002.bmp

 

控制文件结构图

  
       上图中ControlFileData结构中的XLogRecPtr和CheckPoint不是指针,因此应该用右边的相应结构图代替,把这两个合进去有点费劲,将就着看吧。


DSC0003.bmp

XLog相关结构图

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-303127-1-1.html 上篇帖子: PostgreSQL服务过程中的那些事二:Pg服务进程处理简单查询六:执行器执行 下篇帖子: PostgreSQL启动过程中的那些事七:初始化共享内存和信号三:shmem中初始化clog
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表