当前位置:网站首页>PostgreSQL source code learning (24) -- transaction log ⑤ - log writing to wal buffer
PostgreSQL source code learning (24) -- transaction log ⑤ - log writing to wal buffer
2022-06-29 15:36:00 【Hehuyi_ In】
One 、 Introduction to writing process

1. Write steps
Log write WAL Buffer The process is divided into two steps :
- Reserve space : After assembly, the length of the log record has been determined , So we can calculate the length first , And in WAL Buffer Reserve space in the , The process of space reservation passes XLogCtl->Insert->insertpos_lck Lock protection . in other words , Each needs to write WAL Logging processes are mutually exclusive when reserving space .
- Data replication : Once the space reservation is completed , The data replication process can be concurrent ,PG adopt WALInsertLocks Lock to control the process of concurrent replication .PG The statement NUM_XLOGINSERT_LOCKS( At present, it is 8) individual WALInsertLocks, Every WALInsertLocks By lightweight + The log write location consists of . Different transactions of different processes will be randomly logged ( Refer to your own MyProc->pgprocno) Get one WALInsertLocks.
2. WALInsertLocks lock
/*
* pg The statement NUM_XLOGINSERT_LOCKS( At present, it is 8) One for wal Inserted lock WALInsertLock
* The larger the value, the more processes can be inserted concurrently , however CPU The higher the load .
*/
#define NUM_XLOGINSERT_LOCKS 8/* Every WALInsertLock from “ Lightweight lock + Log write location ” form
* When you want to write a log , You have to hold one WALInsertLock( Random access , It doesn't matter which one )
*/
typedef struct
{
LWLock lock; // Lightweight lock , When the lock is released , Indicates that the log has been written to WAL Buffer
XLogRecPtr insertingAt; // Record current log writes WAL Buffer The progress of the , Small records that do not need to be written across pages will not update this value , This value is usually updated when the log record is long .insertingAt This variable will be changed in the process WAL Read from memory to disk , To confirm that all writes to the area have been completed
XLogRecPtr lastImportantAt; // lastImportantAt contains the LSN of the last important WAL record inserted using a given lock. In the log record to be inserted , Some records have nothing to do with data consistency , Even if it is lost, it will not affect , This record does not affect lastImportantAt Value
} WALInsertLock;Here we leave two questions :
- Why is the concurrency of data replication only set to 8, No bigger ?
- How to solve the conflict of multi process concurrent data replication ?
The answer to this question is WaitXLogInsertionsToFinish function , We will learn it in the next article .
Simply speaking , Every time WAL Brush in the disk , Will call this function , And this function needs to traverse all WALInsertLocks, therefore NUM_XLOGINSERT_LOCKS Shoulds not be too large , The current code is written as 8.
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
{
...
}Two 、 XLogInsertRecord function
As mentioned earlier , This code does two important things :
- call ReserveXLogInsertLocation function , For the previously assembled XLOG Reserve space , Return reserved StartPos( The starting position ) and EndPos( End position ).
- call CopyXLogRecordToWAL function , take XLOG Copy data to WAL Buffer
- The last thing to come back is XLOG Of EndPos, That is, where the log has been written
The function starts with some checks
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
uint8 flags,
int num_fpi)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
bool inserted;
XLogRecord *rechdr = (XLogRecord *) rdata->data;
uint8 info = rechdr->xl_info & ~XLR_INFO_MASK;
bool isLogSwitch = (rechdr->xl_rmid == RM_XLOG_ID &&
info == XLOG_SWITCH);
XLogRecPtr StartPos;
XLogRecPtr EndPos;
bool prevDoPageWrites = doPageWrites;
…
START_CRIT_SECTION();
// WAL The exclusive lock will be taken during log segment switching , Other processes cannot reserve space at this time
if (isLogSwitch)
WALInsertLockAcquireExclusive();
else
WALInsertLockAcquire();
// Process current copy Of RedoRecPtr Has it expired , If it's out of date ( It only happens when you have just finished checkpoint operation ), You need to go back to the calling function to recalculate , So this scenario will be slower than other scenarios .
if (RedoRecPtr != Insert->RedoRecPtr)
{
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
// In addition, check whether it is enabled fullPageWrites perhaps forcePageWrites
doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
if (doPageWrites &&
(!prevDoPageWrites ||
(fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr)))
{
/*
* Oops, some buffer now needs to be backed up that the caller didn't
* back up. Start over. If someone matches it but you don't write the whole page , It needs to be reworked , Direct error return
*/
WALInsertLockRelease();
END_CRIT_SECTION();
return InvalidXLogRecPtr;
}Reserved space
/*
* Reserve space for the record in the WAL. This also sets the xl_prev pointer.
* Reserve space , This step will also set xl_prev The pointer
*/
if (isLogSwitch)
// If it is a log switching record , Log switching is just required , Maybe StartPos and EndPos identical , That means you don't need to remember this WAL logging
inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev);
else
{
ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
&rechdr->xl_prev);
inserted = true;
}Data replication section
// After reserving space , Start data replication .inserted by true, Indicates non log switching records
if (inserted)
{
/*
* Now that xl_prev has been filled in, calculate CRC of the record header. at present xl_prev It's filled with , Do... On the record header cdc check
*/
rdata_crc = rechdr->xl_crc;
COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
FIN_CRC32C(rdata_crc);
rechdr->xl_crc = rdata_crc;
/*
* All the record data, including the header, is now ready to be
* inserted. Copy the record in the space reserved. Copy log records to WAL Buffer
*/
CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata,
StartPos, EndPos);
/*
* Unless record is flagged as not important, update LSN of last
* important record in the current slot. When holding all locks, just
* update the first one. Except for some data marked as unimportant , Otherwise, you need to update the current slot lastImportantAt value , If holdingAllLocks It's true , Then update the first value
*/
if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
{
int lockno = holdingAllLocks ? 0 : MyLockNo;
WALInsertLocks[lockno].l.lastImportantAt = StartPos;
}
}
else // inserted by false, Indicates log switching records
{
/*
* This was an xlog-switch record, but the current insert location was
* already exactly at the beginning of a segment, so there was no need
* to do anything. This is a log switching record , But the current insertion position is just at the beginning of the segment , So you don't have to do anything ( Because nothing can be copied ).
*/
}
/*
* Done! Let others know that we're finished. Operation is completed , Release the lock
*/
WALInsertLockRelease();
MarkCurrentTransactionIdLoggedIfAny();
END_CRIT_SECTION();
…
/*
* Update our global variables
*/
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
…
return EndPos;
}3、 ... and 、 Space reservation function ReserveXLogInsertLocation
- by WAL Record ( stay WAL Buffer in ) Reserve space of appropriate size .
- StartPos Is the beginning of the reserved position ,*EndPos Is the end of the reserved position +1(end+1),*PrevPtr Is the beginning of the previous record , It is used to set the xl_prev
- This part is for XLogInsert Function performance is very important , Because this can only be executed serially , The rest can be processed concurrently . So make sure this section is as brief as possible ,insertpos_lck You may encounter fierce competition on busy systems .
- Be careful : The spatial calculation here must be consistent with the following data replication function CopyXLogRecordToWAL Match the code in .

static void
ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos, XLogRecPtr *EndPos,
XLogRecPtr *PrevPtr)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
uint64 startbytepos;
uint64 endbytepos;
uint64 prevbytepos;
size = MAXALIGN(size);
/* All (non xlog-switch) records should contain data. */
Assert(size > SizeOfXLogRecord);
/*
* This part is the core , It is also a part of real serial execution , Be quick
*/
SpinLockAcquire(&Insert->insertpos_lck);
startbytepos = Insert->CurrBytePos;
endbytepos = startbytepos + size;
prevbytepos = Insert->PrevBytePos;
Insert->CurrBytePos = endbytepos;
Insert->PrevBytePos = startbytepos;
SpinLockRelease(&Insert->insertpos_lck);
*StartPos = XLogBytePosToRecPtr(startbytepos);
*EndPos = XLogBytePosToEndRecPtr(endbytepos);
*PrevPtr = XLogBytePosToRecPtr(prevbytepos);
/*
* Check that the conversions between "usable byte positions" and
* XLogRecPtrs work consistently in both directions.
*/
Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
}Four 、 Data replication function CopyXLogRecordToWAL
take WAL Record data is copied to WAL Buffer Space reserved in .

Function parameter :
- write_len:XLOG The total length of , Used for verification .
- isLogSwitch: Whether the record is a log switching record
- rdata:XLogRecData Linked list , Deposited XLOG The data of .
- StartPos:XLOG Write start position of
- EndPos:XLOG At the end of , Used for verification
static void
CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
XLogRecPtr StartPos, XLogRecPtr EndPos)
{
char *currpos;
int freespace;
int written;
XLogRecPtr CurrPos;
XLogPageHeader pagehdr;
/*
* Get a pointer to the right place in the right WAL buffer to start
* inserting to. The starting point of the copy operation
*/
CurrPos = StartPos;
currpos = GetXLogBuffer(CurrPos);
freespace = INSERT_FREESPACE(CurrPos);
/*
* there should be enough space for at least the first field (xl_tot_len) on this page.
*/
Assert(freespace >= sizeof(uint32));
/* Copy record data, Core code , Loop copy rdata The data of each element in the array */
written = 0;
while (rdata != NULL)
{
char *rdata_data = rdata->data;
int rdata_len = rdata->len;
/* It is used to process the data that needs to be written XLOG Longer than WAL Buffer Present in China page Of available space , At this point, you need to first XLOG A part is written to the current page, Then switch to the next page. */
while (rdata_len > freespace)
{
/*
* Write what fits on this page, and continue on the next page.
*/
Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || freespace == 0);
memcpy(currpos, rdata_data, freespace);
rdata_data += freespace;
rdata_len -= freespace;
written += freespace;
CurrPos += freespace;
/*
* Get the next one page Pointer to the beginning position , And set... In the page header xlp_rem_len
*/
currpos = GetXLogBuffer(CurrPos);
pagehdr = (XLogPageHeader) currpos;
pagehdr->xlp_rem_len = write_len - written;
pagehdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
/* skip over the page header, Skip the header */
if (XLogSegmentOffset(CurrPos, wal_segment_size) == 0)
{
CurrPos += SizeOfXLogLongPHD;
currpos += SizeOfXLogLongPHD;
}
else
{
CurrPos += SizeOfXLogShortPHD;
currpos += SizeOfXLogShortPHD;
}
freespace = INSERT_FREESPACE(CurrPos);
}
Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
memcpy(currpos, rdata_data, rdata_len);
currpos += rdata_len;
CurrPos += rdata_len;
freespace -= rdata_len;
written += rdata_len;
rdata = rdata->next;
}
Assert(written == write_len);
…
if (CurrPos != EndPos)
elog(PANIC, "space reserved for WAL record does not match what was written");
}Reference resources
《PostgreSQL Technology insider : Deep exploration of transaction processing 》 The first 4 Chapter
https://blog.csdn.net/obvious__/article/details/119242661?spm=1001.2014.3001.5502
PostgreSQL Of wal Log concurrent write mechanism - You know
https://blog.csdn.net/asmartkiller/article/details/121375548
https://icode.best/i/12479444350651
边栏推荐
- SOFARegistry 源码|数据同步模块解析
- 深入理解 Promise 之手把手教你写一版
- Lumiprobe 点击化学丨非荧光炔烃:己酸STP酯
- TDesign, which gave us benefits last time, will tell us its open source story today
- Rust Basics
- Dynamically listening for DOM element height changes
- 材质 动态自发光
- Sofaregistry source code | data synchronization module analysis
- 中序和后序遍历构建二叉树[递归划分区间与回溯拼接子树+中后序和中前序的相似与不同]
- MCS: multivariate random variable polynomial distribution
猜你喜欢

Uncover the practice of Baidu intelligent test in the field of automatic test execution

Paging SQL (rownum, row_number, deny_rank, rank)

Middle order and post order traversal to construct binary tree [recursive partition interval and backtracking splicing subtree + similarity and difference between middle post order and middle pre orde

打造一个 API 快速开发平台,牛逼!

File common tool class, stream related application (record)

Lumiprobe 脱氧核糖核酸丨炔烃dT亚磷酰胺

数据挖掘复习

Sofaregistry source code | data synchronization module analysis

MCS: discrete random variable Pascal Distribution

Lumiprobe 点击化学丨非荧光炔烃:己酸STP酯
随机推荐
Construction and application of medical field Atlas of dingxiangyuan
kubernetes Unable to connect to the server: x509: certificate has expired or is not yet valid
Leetcode notes: Weekly contest 299
MySQL 数据库命名规范.PDF
Lumiprobe 点击化学丨非荧光炔烃:己酸STP酯
极化SAR地表分类
Chaîne lumineuse libre biovendor κ Et λ) Propriétés chimiques du kit ELISA
真正的软件测试人员 =“半个产品+半个开发”?
postgresql源码学习(24)—— 事务日志⑤-日志写入WAL Buffer
Unity C basic review 27 - delegation example (p448)
MCS: discrete random variable - Hyper geometric distribution
For example, the visual appeal of the live broadcast of NBA Finals can be seen like this?
Solution to the problem that the assembly drawing cannot be recognized after the storage position of SolidWorks part drawing is changed
About sql+nosql: newsql database
Real software testers = "half product + Half development"?
BioVendor遊離輕鏈(κ和λ)Elisa 試劑盒的化學性質
Unity C basic review 29 - Generic delegation (p451)
获取Text组件内容的宽度
SOFARegistry 源码|数据同步模块解析
pwm转0-5V/0-10V/1-5V线性信号变送器