当前位置:网站首页>PostgreSQL source code learning (XX) -- fault recovery ① - transaction log format
PostgreSQL source code learning (XX) -- fault recovery ① - transaction log format
2022-06-11 03:19:00 【Hehuyi_ In】
About WAL Some basic knowledge of log , Refer to previous articles , This article focuses on the source code .
pg Crash recovery ( One )—— WAL And the full page write mechanism _Hehuyi_In The blog of -CSDN Blog _pg wal
pg Crash recovery ( Two )—— WAL Document structure and management _Hehuyi_In The blog of -CSDN Blog _wal file
One 、 Log composition structure
Look at this picture

There are many levels , The specific term
- Every WAL file ( Log segment ) The size is 16M, It is internally divided into multiple pages , The size of each page is 8K( This is also pg Why you need a full page )
- Each log page consists of header information (Header)+ logging (Record) form
- Header There are two kinds of :
- XLogLongPageHeaderData: The first page of the log segment Header Information ( Each segment has only one , The dark blue part in the picture ), The length of the log segment 、 Segment page size and other information
- XLogPageHeaderData: Log segment for other pages Header Information ( Every log page except the first one has a , The light blue part in the picture ), Store the version corresponding to the transaction log 、 Timeline and other information
- XLogLongPageHeaderData contain XLogPageHeaderData And some additional information
- Each log record is recorded by XLogRecord Structure + data (XLOG Record data) form , It is the smallest unit of the transaction log , Each log record represents an action to modify the database
- Log data can be further divided into : Block (XLogRecordBlockHeader)+ Log header (XLogRecordDataHeader)+ Piece of data (Block Data)+ Master data (Main Data)
Two 、 Log header information

The following is in the order in which it appears in the code
1. General header information XLogPageHeaderData
Log segment for other pages Header Information ( Every log page except the first one has a , The light blue part in the picture ), Store the version corresponding to the transaction log 、 Timeline and other information .
/*
* Each page of XLOG file has a header like this:
*/
#define XLOG_PAGE_MAGIC 0xD10D /* can be used as WAL version indicator, Transaction log version information */
typedef struct XLogPageHeaderData
{
uint16 xlp_magic; /* magic value for correctness checks, Correctness check bit */
uint16 xlp_info; /* flag bits, see below, Sign a , See below */
TimeLineID xlp_tli; /* TimeLineID of first record on page, The timeline of the first record in the page id */
XLogRecPtr xlp_pageaddr; /* XLOG address of this page, The first address of this log page */
/*
* When there is not enough space left on the page to save the whole record , Need to save to the next log page ,xlp_rem_len It is used to record the remaining record length to be saved , It tracks the initial header xl_tot_len length (xlp_rem_len is the number of bytes remaining from a previous page; it tracks xl_tot_len in the initial header.)
*/
uint32 xlp_rem_len; /* total len of remaining data for record, The log length of the current page following the previous page */
} XLogPageHeaderData;
// XLogPageHeaderData Size
#define SizeOfXLogShortPHD MAXALIGN(sizeof(XLogPageHeaderData))
// Definition XLogPageHeaderData Corresponding pointer
typedef XLogPageHeaderData *XLogPageHeader;Cross page access is similar to

2. Homepage header information XLogLongPageHeaderData
- XLogLongPageHeaderData: The first page of the log segment Header Information ( Each segment has only one , The dark blue part in the picture ), The length of the log segment 、 Segment page size and other information
- XLogLongPageHeaderData contain XLogPageHeaderData And some additional information
/*
* When you set XLP_LONG_HEADER When marking bit ( Only in each one WAL The first page of the log segment will be set to ), We also need to store some additional information in the header , This extra information is due to the precise location of the file
*/
typedef struct XLogLongPageHeaderData
{
XLogPageHeaderData std; /* standard header fields, Standard header information */
uint64 xlp_sysid; /* system identifier from pg_control, From the control file system id */
uint32 xlp_seg_size; /* just as a cross-check, Log segment size , Used for inspection */
uint32 xlp_xlog_blcksz; /* just as a cross-check, Log page size , Used for inspection */
} XLogLongPageHeaderData;
// XLogLongPageHeaderData Size
#define SizeOfXLogLongPHD MAXALIGN(sizeof(XLogLongPageHeaderData))
// Definition XLogLongPageHeaderData Corresponding pointer
typedef XLogLongPageHeaderData *XLogLongPageHeader;3. Some macro definitions
/* When record crosses page boundary, set this flag in new page's header, When logging spans pages , Set this flag */
#define XLP_FIRST_IS_CONTRECORD 0x0001
/* This flag indicates a "long" page header, yes long header Information ( namely XLogLongPageHeaderData) */
#define XLP_LONG_HEADER 0x0002
/* This flag indicates backup blocks starting in this page are optional, stay pg_start_backup After the function starts , The database will enter FPW state , When backup stops , stay WAL The log is marked with XLP_BKP_REMOVABLE Mark . So let's start here FPW Not necessarily , Enter the optional state */
#define XLP_BKP_REMOVABLE 0x0004
/* All defined flag bits in xlp_info (used for validity checking of header), aforementioned flag Marker bit , Used for inspection header effectiveness */
#define XLP_ALL_FLAGS 0x0007
// Determine the page type , Look, yes long Page size or standard page size
#define XLogPageHeaderSize(hdr) \
(((hdr)->xlp_info & XLP_LONG_HEADER) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD)
/* wal_segment_size can range from 1MB to 1GB, Minimum and maximum log segment sizes */
#define WalSegMinSize 1024 * 1024
#define WalSegMaxSize 1024 * 1024 * 1024Let's look at the logging section

There's a lot of content , Let's introduce them according to the levels in the figure :
- Logging common header XLogRecord
- Log header information : Logging block header XLogRecordBlockHeader+ Logging header XLogRecordDataHeader
- Logging data : Piece of data Block Data+ Master data Main Data.
3、 ... and 、 Logging common header XLogRecord

typedef struct XLogRecord
{
uint32 xl_tot_len; /* total len of entire record, Record the total length */
TransactionId xl_xid; /* xact id, Business id */
XLogRecPtr xl_prev; /* ptr to previous record in log, Pointer to the previous record in the log */
uint8 xl_info; /* flag bits, see below, Record the marker bit and the action that generated the record , See below */
RmgrId xl_rmid; /* resource manager for this record, Resource manager information for this record */
/* 2 bytes of padding here, initialize to zero */
pg_crc32c xl_crc; /* CRC for this record, It should be recorded CRC( Cyclic redundancy check ) */
/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding, Then there are the other two Header Structure */
} XLogRecord;xl_info Record the marker bit and the action that generated the record :
- Which is low 4 Bit stores two kinds of tag information :XLR_SPECIAL_REL_UPDATE and XLR_CHECK_CONSISTENCY, from XLogInsert The caller of the function passes
/*
* If WAL Record in a special way ( Does not involve normal block references ) Updated the storage file of the relationship , Set this flag .PostgreSQL It does not use this method itself , But it allows external tools to read WAL And track the modified block , To identify this particular record type .
*/
#define XLR_SPECIAL_REL_UPDATE 0x01
/*
* Enforce consistency checks on recovery . If enabled , Can perform full page write operations , And use it for consistency checking during recovery . When needed ,XLogInsert The caller of can set this flag , But if rmgr To enable the wal_consistency_checking, The consistency check is performed unconditionally .
*/
#define XLR_CHECK_CONSISTENCY 0x02- high 4 Bit indicates the action that generated the record ( most 16 Kind of ), Different resource id The lower action information is different , So each resource id The number of corresponding actions will be limited . With Heap Operation as an example , Its resource id yes RM_HEAP_ID
/*
* XLOG allows to store some information in high 4 bits of log
* record xl_info field. We use 3 for opcode and one for init bit.
*/
#define XLOG_HEAP_INSERT 0x00
#define XLOG_HEAP_DELETE 0x10
#define XLOG_HEAP_UPDATE 0x20
#define XLOG_HEAP_TRUNCATE 0x30
#define XLOG_HEAP_HOT_UPDATE 0x40
#define XLOG_HEAP_CONFIRM 0x50
#define XLOG_HEAP_LOCK 0x60
#define XLOG_HEAP_INPLACE 0x70
#define XLOG_HEAP_OPMASK 0x70
/*
* When we insert 1st item on new page in INSERT, UPDATE, HOT_UPDATE,
* or MULTI_INSERT, we can (and we do) restore entire page in redo. Mark when the log page writes the first message , For full page writing
*/
#define XLOG_HEAP_INIT_PAGE 0x80Four 、 Logging block header

1. XLogRecordBlockHeader
/*
* Header info for block data appended to an XLOG record. Header information of block data in logging
*/
typedef struct XLogRecordBlockHeader
{
uint8 id; /* block reference ID, Block references id */
uint8 fork_flags; /* fork within the relation, and flags, The branches and marker bits in the table */
uint16 data_length; /* number of payload bytes (not including page image), Load bytes , Does not include page mirroring and XLogRecordBlockHeader The structure itself */
/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows, If set BKPBLOCK_HAS_IMAGE, It also includes XLogRecordBlockImageHeader Structure */
/* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows, If not set BKPBLOCK_SAME_REL, Will contain RelFileNode */
/* BlockNumber follows, The block number follows */
} XLogRecordBlockHeader;
#define SizeOfXLogRecordBlockHeader (offsetof(XLogRecordBlockHeader, data_length) + sizeof(uint16))BlockNumber In the definition of block.h file , It's a 32 Bit unsigned integer , The available values are 0 To 0xFFFFFFFE.
typedef uint32 BlockNumber;
#define InvalidBlockNumber ((BlockNumber) 0xFFFFFFFF)
#define MaxBlockNumber ((BlockNumber) 0xFFFFFFFE)As you can see from the diagram ,XLogRecordBlockHeader Several options may be included :
- XLogRecordBlockImageHeader: contain full page image( Full page image , Also called backup block , For full page writing ), It will be mentioned later
- XLogRecordBlockCompressHeader: Enable compression
- RelFileNode(relfilenode.h): If not set BKPBLOCK_SAME_REL
2. XLogRecordBlockImageHeader
When included full-page image( Backup block , That is, set up BKPBLOCK_HAS_IMAGE) when , Additional header information .
/*
* Additional header information when a full-page image is included
* (i.e. when BKPBLOCK_HAS_IMAGE is set). When included full-page image( Backup block , That is, set up BKPBLOCK_HAS_IMAGE) when , Additional header information
*
* XLOG The code knows PG Data pages usually contain some unused... In the middle hole( hole 、 hole , Free space ), The size is zero bytes . Since we know hole Is zero , So you can delete it from the stored data ( And it doesn't count XLOG Records of the CRC in ). therefore , The actual amount of block data is BLCKSZ - hole Size .
*
* in addition , In the activation of wal_compression when , Will be removed hole after , Try to use PGLZ Compression algorithm compression full page image. This can reduce WAL Capacity , But it will add extra CPU Consume .
* under these circumstances , because hole The length of cannot be passed from BLCKSZ Subtract from page image Number of bytes , So it basically needs to be stored as additional information . But if hole non-existent , We can assume that hole The size is 0, No need to store additional information .
* Please note that , If the number of bytes saved by compression is less than the length of the additional information , So in WAL Storage in page image Original version of , Instead of the compressed version .
* therefore , When page image When successfully compressed , The actual amount of block data is less than BLCKSZ-hole Size - The size of the additional information .
*/
typedef struct XLogRecordBlockImageHeader
{
uint16 length; /* number of page image bytes, Number of bytes of page image */
uint16 hole_offset; /* number of bytes before "hole",hole Number of bytes ahead */
uint8 bimg_info; /* flag bits, see below, Marker bit */
/*
* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an
* XLogRecordBlockCompressHeader struct follows.
*/
} XLogRecordBlockImageHeader;
/* Information stored in bimg_info */
#define BKPIMAGE_HAS_HOLE 0x01 /* page image has "hole" */
#define BKPIMAGE_IS_COMPRESSED 0x02 /* page image is compressed */
#define BKPIMAGE_APPLY 0x04 /* page image should be restored during replay */3. XLogRecordBlockCompressHeader
/*
* Extra header information used when page image has "hole" and
* is compressed.
*/
typedef struct XLogRecordBlockCompressHeader
{
uint16 hole_length; /* number of bytes in "hole" */
} XLogRecordBlockCompressHeader;
#define SizeOfXLogRecordBlockCompressHeader \
sizeof(XLogRecordBlockCompressHeader)4. RelFileNode
This structure is very simple
typedef struct RelFileNode
{
Oid spcNode; /* tablespace */
Oid dbNode; /* database */
Oid relNode; /* relation */
} RelFileNode;5. MaxSizeOfXLogRecordBlockHeader
XLogRecordBlockHeader Maximum size, The biggest thing is that every part has , And then add up .
/*
* Maximum size of the header for a block reference. This is used to size a
* temporary buffer for constructing the header.
*/
#define MaxSizeOfXLogRecordBlockHeader \
(SizeOfXLogRecordBlockHeader + \
SizeOfXLogRecordBlockImageHeader + \
SizeOfXLogRecordBlockCompressHeader + \
sizeof(RelFileNode) + \
sizeof(BlockNumber))5、 ... and 、 Logging header XLogRecordDataHeaderShort/Long
main data Partial header information , It can be divided into two types . If the data length is less than 256 bytes Then use short , And save the length in one byte , Otherwise, use a long one .
/*
* These structs are currently not used in the code, they are here just for
* documentation purposes. These structures are reflected in the fact that they are no longer used in the code , It remains here for documentation purposes only .
*/
typedef struct XLogRecordDataHeaderShort
{
uint8 id; /* XLR_BLOCK_ID_DATA_SHORT */
uint8 data_length; /* number of payload bytes */
} XLogRecordDataHeaderShort;
#define SizeOfXLogRecordDataHeaderShort (sizeof(uint8) * 2)typedef struct XLogRecordDataHeaderLong
{
uint8 id; /* XLR_BLOCK_ID_DATA_LONG */
/* followed by uint32 data_length, unaligned */
} XLogRecordDataHeaderLong;
#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))6、 ... and 、 Logging the real data part
Here we introduce the merging of block data Block Data And master data Main Data, Because they are related .
XLOG Record Divided by the content of the stored data , It can be roughly divided into three categories :
- Record for backup block( Backup block ): Storage full-write-page Of block, To solve the problem of writing log pages ;
- Record for tuple data block( Non backup blocks ): stay full-write-page after , Record the corresponding page Medium tuple change
- Record for Checkpoint:checkpoint occurs , Record... In the transaction log file checkpoint Information ( These include Redo point).
Each type contains different header and data information , It can be seen in combination with the previous structure introduction .

Previous articles pg Crash recovery ( 3、 ... and )—— approach XLOG Record _Hehuyi_In The blog of -CSDN Blog It's also recorded , You can refer to .
Reference resources
《PostgreSQL Technology insider : Deep exploration of transaction processing 》 The first 4 Chapter
PostgreSQL DBA(17) - XLOG Record data internal structure - Simple books
PostgreSQL Source code interpretation (109)- WAL#5( Relevant data structure ) - Simple books
PostgreSQL xlog Format backup full page_yzs87 The blog of -CSDN Blog
PostgreSQL xlog Format checkpoint_yzs87 The blog of -CSDN Blog
边栏推荐
- Pyqt5:slider slider control
- Disk quota exceeded
- Troubleshooting of single chip microcomputer communication data delay
- . Net module and assembly - NET Module vs Assembly
- B_ QuRT_ User_ Guide(16)
- WinDbg-虚拟机-双机调试-驱动文件的调试
- Location data fusion Table 3
- Vocabulary Construction -- code completion fast food tutorial (3) - word segmentation
- R生物信息学统计分析
- Arduino uno connected to jq8900-16p voice broadcast module
猜你喜欢
随机推荐
B_QuRT_User_Guide(18)
The two departments jointly issued the nine provisions on fire safety management of off campus training institutions
ArTalk | 如何用最小投入,构建国产超融合进化底座?
VMware virtual machine IP, gateway settings. The virtual machine cannot be pinged to the Internet
Pyqt5:slider slider control
亚马逊测评自养号,小白应该如何开始?
Location data fusion Table 3
B_ QuRT_ User_ Guide(16)
B_ QuRT_ User_ Guide(18)
Solr import MySQL database report: Data config problem: invalid byte 2 of 2-byte UTF-8 sequence
cv. Houghcircles: Circular Hough transform opencv
postgresql源码学习(21)—— 故障恢复②-事务日志初始化
618将至!全渠道开售,高价低配的OPPO Reno6能赢吗?
@Controller @transactional @service annotation is invalid and less dependent
A simple understanding of C language array
MySQL学习笔记:JSON嵌套数组查询
文件合成器
词汇表的构建——代码补全快餐教程(3)-分词
單片機通信數據延遲問題排查
Cypress 88359 WL command enable hotspot









