当前位置:网站首页>PostgreSQL source code learning (XX) -- fault recovery ① - transaction log format

PostgreSQL source code learning (XX) -- fault recovery ① - transaction log format

2022-06-11 03:19:00 Hehuyi_ In

About WAL Some basic knowledge of log , Refer to previous articles , This article focuses on the source code .

pg Crash recovery ( One )—— WAL And the full page write mechanism _Hehuyi_In The blog of -CSDN Blog _pg wal
pg Crash recovery ( Two )—— WAL Document structure and management _Hehuyi_In The blog of -CSDN Blog _wal file
 

One 、 Log composition structure

Look at this picture

There are many levels , The specific term

  • Every WAL file ( Log segment ) The size is 16M, It is internally divided into multiple pages , The size of each page is 8K( This is also pg Why you need a full page )
  • Each log page consists of header information (Header)+ logging (Record) form
  • Header There are two kinds of :
  • XLogLongPageHeaderData: The first page of the log segment Header Information ( Each segment has only one , The dark blue part in the picture ), The length of the log segment 、 Segment page size and other information
  • XLogPageHeaderData: Log segment for other pages Header Information ( Every log page except the first one has a , The light blue part in the picture ), Store the version corresponding to the transaction log 、 Timeline and other information
  • XLogLongPageHeaderData contain XLogPageHeaderData And some additional information
  • Each log record is recorded by XLogRecord Structure + data (XLOG Record data) form , It is the smallest unit of the transaction log , Each log record represents an action to modify the database
  • Log data can be further divided into : Block (XLogRecordBlockHeader)+ Log header (XLogRecordDataHeader)+ Piece of data (Block Data)+ Master data (Main Data)

Two 、 Log header information

The following is in the order in which it appears in the code

1. General header information XLogPageHeaderData

        Log segment for other pages Header Information ( Every log page except the first one has a , The light blue part in the picture ), Store the version corresponding to the transaction log 、 Timeline and other information .

/*
 * Each page of XLOG file has a header like this:
 */

#define XLOG_PAGE_MAGIC 0xD10D	/* can be used as WAL version indicator, Transaction log version information  */

typedef struct XLogPageHeaderData
{
	uint16		xlp_magic;		/* magic value for correctness checks, Correctness check bit  */
	uint16		xlp_info;		/* flag bits, see below, Sign a , See below  */
	TimeLineID	xlp_tli;		/* TimeLineID of first record on page, The timeline of the first record in the page id */
	XLogRecPtr	xlp_pageaddr;	/* XLOG address of this page, The first address of this log page  */

	/*
	 *  When there is not enough space left on the page to save the whole record , Need to save to the next log page ,xlp_rem_len It is used to record the remaining record length to be saved , It tracks the initial header xl_tot_len length (xlp_rem_len is the number of bytes remaining from a previous page; it tracks xl_tot_len in the initial header.)
	 */
	uint32		xlp_rem_len;	/* total len of remaining data for record, The log length of the current page following the previous page  */
} XLogPageHeaderData;

// XLogPageHeaderData Size 
#define SizeOfXLogShortPHD	MAXALIGN(sizeof(XLogPageHeaderData))
//  Definition XLogPageHeaderData Corresponding pointer 
typedef XLogPageHeaderData *XLogPageHeader;

Cross page access is similar to

2. Homepage header information XLogLongPageHeaderData

  • XLogLongPageHeaderData: The first page of the log segment Header Information ( Each segment has only one , The dark blue part in the picture ), The length of the log segment 、 Segment page size and other information
  • XLogLongPageHeaderData contain XLogPageHeaderData And some additional information
/*
 *  When you set XLP_LONG_HEADER When marking bit ( Only in each one WAL The first page of the log segment will be set to ), We also need to store some additional information in the header , This extra information is due to the precise location of the file 
 */
typedef struct XLogLongPageHeaderData
{
	XLogPageHeaderData std;		/* standard header fields, Standard header information  */
	uint64		xlp_sysid;		/* system identifier from pg_control, From the control file system id */
	uint32		xlp_seg_size;	/* just as a cross-check, Log segment size , Used for inspection  */
	uint32		xlp_xlog_blcksz;	/* just as a cross-check, Log page size , Used for inspection  */
} XLogLongPageHeaderData;

// XLogLongPageHeaderData Size 

#define SizeOfXLogLongPHD	MAXALIGN(sizeof(XLogLongPageHeaderData))

//  Definition XLogLongPageHeaderData Corresponding pointer 
typedef XLogLongPageHeaderData *XLogLongPageHeader;

3. Some macro definitions

/* When record crosses page boundary, set this flag in new page's header, When logging spans pages , Set this flag  */
#define XLP_FIRST_IS_CONTRECORD		0x0001

/* This flag indicates a "long" page header, yes long header Information ( namely XLogLongPageHeaderData) */
#define XLP_LONG_HEADER				0x0002

/* This flag indicates backup blocks starting in this page are optional, stay pg_start_backup After the function starts , The database will enter FPW state , When backup stops , stay WAL The log is marked with XLP_BKP_REMOVABLE Mark . So let's start here FPW Not necessarily , Enter the optional state  */
#define XLP_BKP_REMOVABLE			0x0004

/* All defined flag bits in xlp_info (used for validity checking of header), aforementioned flag Marker bit , Used for inspection header effectiveness  */
#define XLP_ALL_FLAGS				0x0007

// Determine the page type , Look, yes long Page size or standard page size 
#define XLogPageHeaderSize(hdr)		\
	(((hdr)->xlp_info & XLP_LONG_HEADER) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD)

/* wal_segment_size can range from 1MB to 1GB, Minimum and maximum log segment sizes  */
#define WalSegMinSize 1024 * 1024
#define WalSegMaxSize 1024 * 1024 * 1024

Let's look at the logging section

There's a lot of content , Let's introduce them according to the levels in the figure :

  • Logging common header XLogRecord
  • Log header information : Logging block header XLogRecordBlockHeader+ Logging header XLogRecordDataHeader
  • Logging data : Piece of data Block Data+ Master data Main Data.

 

3、 ... and 、 Logging common header XLogRecord

typedef struct XLogRecord
{
	uint32		xl_tot_len;		/* total len of entire record, Record the total length  */
	TransactionId xl_xid;		/* xact id, Business id */
	XLogRecPtr	xl_prev;		/* ptr to previous record in log, Pointer to the previous record in the log  */
	uint8		xl_info;		/* flag bits, see below, Record the marker bit and the action that generated the record , See below  */
	RmgrId		xl_rmid;		/* resource manager for this record, Resource manager information for this record  */
	/* 2 bytes of padding here, initialize to zero */
	pg_crc32c	xl_crc;			/* CRC for this record, It should be recorded CRC( Cyclic redundancy check ) */

	/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding, Then there are the other two Header Structure  */

} XLogRecord;

xl_info Record the marker bit and the action that generated the record :

  • Which is low 4 Bit stores two kinds of tag information :XLR_SPECIAL_REL_UPDATE and XLR_CHECK_CONSISTENCY, from XLogInsert The caller of the function passes
/*
 *  If WAL Record in a special way ( Does not involve normal block references ) Updated the storage file of the relationship , Set this flag .PostgreSQL It does not use this method itself , But it allows external tools to read WAL And track the modified block , To identify this particular record type . 
*/
#define XLR_SPECIAL_REL_UPDATE	0x01

/*
 *  Enforce consistency checks on recovery . If enabled , Can perform full page write operations , And use it for consistency checking during recovery . When needed ,XLogInsert The caller of can set this flag , But if rmgr To enable the wal_consistency_checking, The consistency check is performed unconditionally . 
*/
#define XLR_CHECK_CONSISTENCY	0x02
  • high 4 Bit indicates the action that generated the record ( most 16 Kind of ), Different resource id The lower action information is different , So each resource id The number of corresponding actions will be limited . With Heap Operation as an example , Its resource id yes RM_HEAP_ID
/*
* XLOG allows to store some information in high 4 bits of log
 * record xl_info field.  We use 3 for opcode and one for init bit.
 */
#define XLOG_HEAP_INSERT		0x00
#define XLOG_HEAP_DELETE		0x10
#define XLOG_HEAP_UPDATE		0x20
#define XLOG_HEAP_TRUNCATE		0x30
#define XLOG_HEAP_HOT_UPDATE	0x40
#define XLOG_HEAP_CONFIRM		0x50
#define XLOG_HEAP_LOCK			0x60
#define XLOG_HEAP_INPLACE		0x70
#define XLOG_HEAP_OPMASK		0x70
/*
 * When we insert 1st item on new page in INSERT, UPDATE, HOT_UPDATE,
 * or MULTI_INSERT, we can (and we do) restore entire page in redo. Mark when the log page writes the first message , For full page writing 
 */
#define XLOG_HEAP_INIT_PAGE		0x80

Four 、 Logging block header

1. XLogRecordBlockHeader

/*
 * Header info for block data appended to an XLOG record. Header information of block data in logging 
 */
typedef struct XLogRecordBlockHeader
{
	uint8		id;				/* block reference ID, Block references id */
	uint8		fork_flags;		/* fork within the relation, and flags, The branches and marker bits in the table  */
	uint16		data_length;	/* number of payload bytes (not including page image), Load bytes , Does not include page mirroring and XLogRecordBlockHeader The structure itself  */

	/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows, If set BKPBLOCK_HAS_IMAGE, It also includes XLogRecordBlockImageHeader Structure  */
	/* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows, If not set BKPBLOCK_SAME_REL, Will contain RelFileNode */
	/* BlockNumber follows, The block number follows  */
} XLogRecordBlockHeader;

#define SizeOfXLogRecordBlockHeader (offsetof(XLogRecordBlockHeader, data_length) + sizeof(uint16))

BlockNumber In the definition of block.h file , It's a 32 Bit unsigned integer , The available values are 0 To 0xFFFFFFFE.

typedef uint32 BlockNumber;
#define InvalidBlockNumber		((BlockNumber) 0xFFFFFFFF)
#define MaxBlockNumber			((BlockNumber) 0xFFFFFFFE)

As you can see from the diagram ,XLogRecordBlockHeader Several options may be included :

  • XLogRecordBlockImageHeader: contain full page image( Full page image , Also called backup block , For full page writing ), It will be mentioned later
  • XLogRecordBlockCompressHeader: Enable compression
  • RelFileNoderelfilenode.h): If not set BKPBLOCK_SAME_REL

2. XLogRecordBlockImageHeader

When included full-page image( Backup block , That is, set up BKPBLOCK_HAS_IMAGE) when , Additional header information .

/*
 * Additional header information when a full-page image is included
 * (i.e. when BKPBLOCK_HAS_IMAGE is set).  When included full-page image( Backup block , That is, set up BKPBLOCK_HAS_IMAGE) when , Additional header information 
 *
 * XLOG The code knows PG Data pages usually contain some unused... In the middle  hole( hole 、 hole , Free space ), The size is zero bytes . Since we know hole Is zero , So you can delete it from the stored data ( And it doesn't count XLOG Records of the CRC in ).  therefore , The actual amount of block data is  BLCKSZ - hole Size .
 *
*  in addition , In the activation of wal_compression when , Will be removed hole after , Try to use PGLZ Compression algorithm compression full page image. This can reduce WAL Capacity , But it will add extra CPU Consume .
 *  under these circumstances , because hole The length of cannot be passed from BLCKSZ Subtract from page image Number of bytes , So it basically needs to be stored as additional information . But if hole non-existent , We can assume that hole The size is 0, No need to store additional information .
 *  Please note that , If the number of bytes saved by compression is less than the length of the additional information , So in WAL Storage in page image Original version of , Instead of the compressed version .
 *  therefore , When page image When successfully compressed , The actual amount of block data is less than BLCKSZ-hole Size - The size of the additional information .
 */

typedef struct XLogRecordBlockImageHeader
{
	uint16		length;			/* number of page image bytes, Number of bytes of page image  */
	uint16		hole_offset;	/* number of bytes before "hole",hole Number of bytes ahead  */
	uint8		bimg_info;		/* flag bits, see below, Marker bit  */

	/*
	 * If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an
	 * XLogRecordBlockCompressHeader struct follows.
	 */
} XLogRecordBlockImageHeader;

/* Information stored in bimg_info */
#define BKPIMAGE_HAS_HOLE		0x01	     /* page image has "hole" */
#define BKPIMAGE_IS_COMPRESSED		0x02	 /* page image is compressed */
#define BKPIMAGE_APPLY		0x04 	/* page image should be restored during replay */

3. XLogRecordBlockCompressHeader

/*
 * Extra header information used when page image has "hole" and
 * is compressed.
 */
typedef struct XLogRecordBlockCompressHeader
{
	uint16		hole_length;	/* number of bytes in "hole" */
} XLogRecordBlockCompressHeader;

#define SizeOfXLogRecordBlockCompressHeader \
	sizeof(XLogRecordBlockCompressHeader)

4. RelFileNode

This structure is very simple

typedef struct RelFileNode
{
	Oid			spcNode;		/* tablespace */
	Oid			dbNode;			/* database */
	Oid			relNode;		/* relation */
} RelFileNode;

5. MaxSizeOfXLogRecordBlockHeader

XLogRecordBlockHeader Maximum size, The biggest thing is that every part has , And then add up .

/*
 * Maximum size of the header for a block reference. This is used to size a
 * temporary buffer for constructing the header. 
*/
#define MaxSizeOfXLogRecordBlockHeader \
	(SizeOfXLogRecordBlockHeader + \
	 SizeOfXLogRecordBlockImageHeader + \
	 SizeOfXLogRecordBlockCompressHeader + \
	 sizeof(RelFileNode) + \
	 sizeof(BlockNumber))

5、 ... and 、 Logging header XLogRecordDataHeaderShort/Long

       main data Partial header information , It can be divided into two types . If the data length is less than 256 bytes Then use short , And save the length in one byte , Otherwise, use a long one .

/*
 * These structs are currently not used in the code, they are here just for
 * documentation purposes.  These structures are reflected in the fact that they are no longer used in the code , It remains here for documentation purposes only .
 */
typedef struct XLogRecordDataHeaderShort
{
	uint8		id;				/* XLR_BLOCK_ID_DATA_SHORT */
	uint8		data_length;	/* number of payload bytes */
}			XLogRecordDataHeaderShort;

#define SizeOfXLogRecordDataHeaderShort (sizeof(uint8) * 2)
typedef struct XLogRecordDataHeaderLong
{
	uint8		id;				/* XLR_BLOCK_ID_DATA_LONG */
	/* followed by uint32 data_length, unaligned */
}			XLogRecordDataHeaderLong;

#define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32))

6、 ... and 、 Logging the real data part

        Here we introduce the merging of block data Block Data And master data Main Data, Because they are related .

XLOG Record Divided by the content of the stored data , It can be roughly divided into three categories :

  • Record for backup block( Backup block ): Storage full-write-page Of block, To solve the problem of writing log pages ;
  • Record for tuple data block( Non backup blocks ): stay full-write-page after , Record the corresponding page Medium tuple change
  • Record for Checkpointcheckpoint occurs , Record... In the transaction log file checkpoint Information ( These include Redo point).

Each type contains different header and data information , It can be seen in combination with the previous structure introduction .

/img/90/dced1f7acbb87dc73be126ee2fbf2f.png

      Previous articles pg Crash recovery ( 3、 ... and )—— approach XLOG Record _Hehuyi_In The blog of -CSDN Blog It's also recorded , You can refer to .

Reference resources

PostgreSQL Technology insider : Deep exploration of transaction processing 》 The first 4 Chapter

PostgreSQL DBA(17) - XLOG Record data internal structure - Simple books

PostgreSQL Source code interpretation (109)- WAL#5( Relevant data structure ) - Simple books

PostgreSQL xlog Format backup full page_yzs87 The blog of -CSDN Blog

PostgreSQL xlog Format checkpoint_yzs87 The blog of -CSDN Blog

PostgreSQL xlog Format no backup full page_ITPUB Blog

原网站

版权声明
本文为[Hehuyi_ In]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110248364730.html