当前位置:网站首页>Postgresql-xl global snapshot and GTM code walking (branch line)
Postgresql-xl global snapshot and GTM code walking (branch line)
2022-08-03 19:02:00 【mingjie】
相关: 《Postgresql源码(18)PGPROC相关结构》《Postgresql源码(65)新快照体系Globalvis工作原理分析》《Postgresql快照优化Globalvis新体系分析(性能大幅增强)》《Postgresql源码(23)Clog使用的Slru页面淘汰机制》
(第一篇PG视角、下一篇GTM视角) (The front is a mess of some concepts,最后一部分是GDB走读)
1 概念
1.1 集群MVCC
- Postgres-xl基本上使用PG提供的xmin、xmax、clog、snapshot.xl只是扩展了PGmechanism to allocate transactions ID and make the snapshot available globally.
- 当用户向cn发出 DML 语句时,cn从 GTM 获取全局事务 ID(GXID)and global transaction snapshots and send them to datanodes,dn 使用 GXID 和来自cnsnapshots to perform specific operations.通过这种方式,dnshare the same transaction context,And when the transaction is in multiplecn和dn中运行时,It keeps atomic and uniform visibility. 在事务结束时,If the update involves multiple nodes,The coordinator is used 2PC Protocols implicitly commit transactions. By tracking the global transaction state,协调器向 GTM Report global transaction status.
1.2 GTMcommunication timing
- cn和GTM通信:
- require new transaction ID
- need new transaction snapshot
- commit or abort transactions
- dn和GTM通信:
- vacuum
1.3 可见性判断
- PGTwo key pieces of information are required for the additivity judgment to get the correct result
- 运行中的事务:snapshot
- The state of a non-running transaction:clog 或 Tuple flags(shot path)
- xlVisibility judgment needsGTM提供:
- There needs to be a globally uniformly distributed transactionID
- Global unified snapshot
- Every transaction is started、结束、一阶段提交、The second stage submission will be notifiedGTM,让GTM获得全局信息,Global snapshots can be generated.
1.4 GTM交互
- 如图所示,当cnWhen starting a new transaction,它会向 GTM Request a new transaction ID(GXID,global transaction id).
- If the isolation level is REPEATED READ,A snapshot will be taken and used throughout the transaction.
- If the isolation level is READ COMMITTED ,Restart each statement GTM 获取快照.
- 然后分析语句,Determine which data node to go,And do transformations for each data node if necessary.
- 注意,statement will pass GXID and global snapshots are delivered to the appropriate datanodes,to maintain global transaction ID and row visibility.
- 在事务结束时,If the update in the transaction involves more than onedn,The coordinator sends out PREPARE TRANSACTION for 2PC,然后发出 COMMIT.These steps will also be reported to GTM,to track the status of each transaction,to compute subsequent global snapshots.
1.5 GTM提供的上层接口
连接GTM
- IsGTMConnected()
- InitGTM():创建连接,保存连接信息到本地
- CloseGTM()
获取全局事务ID
- BeginTranGTM()
- BeginTranAutovacuumGTM()
事务通知
- CommitTranGTM():通知GTM事务提交
- RollbackTranGTM():通知GTM事务回滚
- StartPreparedTranGTM():通知GTM启动prepare
- PrepareTranGTM():通知GTM完成prepare
- CommitPreparedTranGTM():通知GTM二阶段提交
获取快照、GID
- GetSnapshotGTM():Get a global snapshot
- GetGIDDataGTM():Get two-stagegid
2 Postgresql-xlModifications to transaction handling functions
Please refer to the basic functions of transaction processing functions:《Postgresql源码(60)事务系统总结》,下面是pg-xlModifications to transaction processing functions in distributed scenarios.
2.1 StartTransaction
Transaction state machine functions:差异点在PGXCSerializable isolation levels are not supported.
2.2 CommitTransaction
- 如果cnInvolves transactional write operations,Called when the transaction commits
PrepareTransaction
Do a one-phase commit.
perpare阶段:
- prepared后,cn会新起一个事务,Use the same as the second stageGXIDContinue with transaction commit.
- cn会继续调用
PreCommit_Remote
以 2PC 方式将commit传播到dn. - cn处理完成后,会调用
FinishPreparedTransaction
结束2PC. - 然后
CallGTMCallbacks
Call the callback function notificationGTM,For example the global sequence manager.
正常事务提交:
RecordTransactionCommit
写XLOG
结束:
AtEOXact_GlobalTxn
请求GTM提交事务.AtEOXact_Remote
清理事务信息.
2.3 PrepareTransaction
- cn调用PrePrepare_Remote发送prepare命令到dn.
- cn调用CallGTMCallbacks通知GTM事务已经prepared.
- cn调用AtEOXact_GlobalTxn通知GTM可以prepare了.
2.4 AbortTransaction
- cn调用PreAbort_Remote取消dn上的事务.
- cn调用FinishPreparedTransactionClean up local transactions.
- cn调用CallGTMCallbacks通知GTM事务已经回滚.
- cn调用AtEOXact_GlobalTxn通知GTM取消事务.
- cn调用AtEOXact_Remote清理.
3 Postgresql-xlSnapshot data structure comparison
- GTM_SnapshotData 和 SnapshotData 相似,GTM 在cn和dnManage the same snapshot data outside.
- GTM There is no subtransaction data,Because subtransactions are not supported.
- GTM 不需要存commandid ID 数据,Because of the start of the transactioncnwill be stored locally. commandid 可以在cnprocessed locally,无需 GTM 帮助. if involvedcn或其他cn上需要增加,它会通知cn后使用.
// PG
typedef struct SnapshotData
{
SnapshotSatisfiesFunc satisfies; /* tuple test function */
TransactionId xmin; /* all XID < xmin are visible to me */
TransactionId xmax; /* all XID >= xmax are invisible to me */
TransactionId *xip;
uint32 xcnt; /* # of xact ids in xip[] */
#ifdef PGXC /* PGXC_COORD */
uint32 max_xcnt; /* Max # of xact in xip[] */
#endif
TransactionId *subxip;
int32 subxcnt; /* # of xact ids in subxip[] */
bool suboverflowed; /* has the subxip array overflowed? */
bool takenDuringRecovery; /* recovery-shaped snapshot? */
bool copied; /* false if it's a static snapshot */
CommandId curcid; /* in my xact, CID < curcid are visible */
/*
* An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
* snapshots.
*/
uint32 speculativeToken;
/*
* Book-keeping information, used by the snapshot manager
*/
uint32 active_count; /* refcount on ActiveSnapshot stack */
uint32 regd_count; /* refcount on RegisteredSnapshots */
pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */
TimestampTz whenTaken; /* timestamp when snapshot was taken */
XLogRecPtr lsn; /* position in the WAL stream when taken */
} SnapshotData;
// PGXL
typedef struct GTM_SnapshotData
{
uint64 sn_snapid;
GlobalTransactionId sn_xmin;
GlobalTransactionId sn_xmax;
uint32 sn_xcnt;
GlobalTransactionId *sn_xip;
} GTM_SnapshotData;
4【调试】CN快照获取
数据准备
-- cn1执行
psql -p50854 -h127.0.0.1 -Upgxc postgres
drop table clstr_tst;
CREATE TABLE clstr_tst (a SERIAL, b INT, c TEXT, d TEXT) DISTRIBUTE BY HASH (b);
INSERT INTO clstr_tst (b, c) VALUES (1, 'once');
INSERT INTO clstr_tst (b, c) VALUES (2, 'diez');
INSERT INTO clstr_tst (b, c) VALUES (3, 'treinta y uno');
INSERT INTO clstr_tst (b, c) VALUES (4, 'veintidos');
INSERT INTO clstr_tst (b, c) VALUES (5, 'tres');
INSERT INTO clstr_tst (b, c) VALUES (6, 'veinte');
INSERT INTO clstr_tst (b, c) VALUES (7, 'veintitres');
-- 数据分布
-- dn
psql -p50856 -h127.0.0.1 -Upgxc postgres -c 'select * from clstr_tst'
1
2
5
6
psql -p50857 -h127.0.0.1 -Upgxc postgres -c 'select * from clstr_tst'
3
4
7
调试
-- cn1执行
psql -p50854 -h127.0.0.1 -Upgxc postgres
begin;
update clstr_tst set c = 'updated' where b = 5;
select txid_current();
20215
-- cn2执行
psql -p50855 -h127.0.0.1 -Upgxc postgres
update clstr_tst set c = 'updated' where b = 7;
select txid_current();
20221
-- cn1调试
4.1 CNGet local snapshotsglobal_snapshot_source=coordinator
GetSnapshotData
// 【1】Users can configure the snapshot acquisition method,默认GTM,Local snapshots can also be generated,The price is global consistency.
if (GlobalSnapshotSource == GLOBAL_SNAPSHOT_SOURCE_GTM)
if (GetPGXCSnapshotData(snapshot, latest))
return snapshot;
// 【2】Local snapshot generation
for (index = 0; index < numProcs; index++)
...
// 【3】读PGXACT->xmin
xid = pgxact->xmin; /* fetch just once */
if (TransactionIdIsNormal(xid) && NormalTransactionIdPrecedes(xid, globalxmin))
globalxmin = xid;
...
if (NormalTransactionIdPrecedes(xid, xmin))
xmin = xid;
// 【4】The construction of the end-of-cycle snapshot is complete,顺便更新PGXACT->xmin
if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
//【5】xminafter parameter correction,As a global small cleanable site.
...
RecentGlobalDataXmin = RecentGlobalXmin;
// cat /sys/devices/system/cpu/cpu1/cache/index0/coherency_line_size
// 64
其中【4】、【3】will bring a lotcacheline失效,This leads to rapid performance degradation in high concurrency scenarios:《Postgresql快照优化Globalvis新体系分析(性能大幅增强)》.
4.2 CNRemote snapshot acquisitionglobal_snapshot_source=gtm
场景
s1: -----------begin(20251)---------------------------------------------
s2: ---------------------------------begin(20260)-----------------------
s3: --------------------------------------------------------debug---------
step1. GetSnapshotData
GetSnapshotData
// 【1】Users can configure the snapshot acquisition method,默认GTM,Local snapshots can also be generated,The price is global consistency.
if (GlobalSnapshotSource == GLOBAL_SNAPSHOT_SOURCE_GTM)
if (GetPGXCSnapshotData(snapshot, latest))
return snapshot;
step2. GetSnapshotDataFromGTM
GetPGXCSnapshotData(Snapshot snapshot, bool latest)
GetSnapshotDataFromGTM(snapshot)
// 读取ClusterMonitorCtl->reporting_recent_global_xmin
reporting_xmin = ClusterMonitorGetReportingGlobalXmin();
// GTM上层接口
// {sn_snapid = 14190, sn_xmin = 20251, sn_xmax = 20260, sn_xcnt = 1, sn_xip = 0x13db6e0}
gtm_snapshot = GetSnapshotGTM(GetCurrentTransactionIdIfAny(), canbe_grouped);
// Take the snapshot to update the global cleanup site by the way
// ClusterMonitorCtl->gtm_recent_global_xmin = 20251
RecentGlobalXmin = ClusterMonitorGetGlobalXmin(false);
RecentGlobalDataXmin = RecentGlobalXmin;
// Construct to a global snapshot
SetGlobalSnapshotData(gtm_snapshot->sn_xmin, gtm_snapshot->sn_xmax,gtm_snapshot->sn_xcnt, gtm_snapshot->sn_xip, SNAPSHOT_DIRECT);
// Constructed with a global snapshotPG快照
GetSnapshotFromGlobalSnapshot(snapshot);
// 配置:snapshot->xmin
// 配置:snapshot->xmax
// 配置:snapshot->xcnt
// 计算:global_xmin(PG是遍历PGXACT的xmin和xid,PGXL直接用ClusterMonitorCtl->gtm_recent_global_xmin)
// 更新RecentGlobalXmin和RecentGlobalDataXmin
// 更新ClusterMonitor
ClusterMonitorSyncGlobalStateUsingSnapshot(gtm_snapshot);
关于globalxmin的计算:gtm_recent_global_xmin
cluster monitor process进程每隔5秒唤醒一次
while (!got_SIGTERM)
...
oldestXmin = GetOldestXminInternal(NULL, 0, true, lastGlobalXmin);
// 【重要】Take what you seeoldestXmin发给GTM,从GTMGet the global minimumnewOldestXmin
ReportGlobalXmin(oldestXmin, &newOldestXmin, &latestCompletedXid)));
ClusterMonitorSetGlobalXmin(newOldestXmin)
...
// 扩展CLOG,Guarantee any later dependenciesRecentGlobalXmin的操作可以在CLOG正确拿到slot,下面具体介绍
ExtendLogs(newOldestXmin);
...
ClusterMonitorCtl->gtm_recent_global_xmin = newOldestXmin;
5 CLOG扩展
5.1 基础
- 32个lsn一组,一个页面8192字节,一个字节8位,2Bits can represent a transaction status,So a page can correspond8192 * 4 个lsn.
- 每32个一组,一个页面有1024组,每组记录最大lsn在group_lsn中.
- 一个页面1024组,需要1024个uint64记录每组最大的lsn.
- 内存连续申请,头部指针,尾部数据.中间控制信息.数组大小=页面个数
- 页面个数=
Min(128, Max(4, NBuffers / 512))
,最大128个,最小4个. - 例如shared_buffers=128MB,NBuffers=16384,页面个数=32个.
- CLOG中一个页面常称为SLOT.
5.2 PG扩展
PG单机:
void
ExtendCLOG(TransactionId newestXact)
{
int pageno;
// newestXact % CLOG_XACTS_PER_PAGE,为0Indicates that the previous one has been used up,需要扩展.
if (TransactionIdToPgIndex(newestXact) != 0 &&
// ==3Rewind occurred,must be expanded,
!TransactionIdEquals(newestXact, FirstNormalTransactionId))
return;
// newestXact / CLOG_XACTS_PER_PAGE:计算slot位置.
pageno = TransactionIdToPage(newestXact);
LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
ZeroCLOGPage(pageno, true);
SimpleLruZeroPage
// 选出一个slot : 《Postgresql源码(23)Clog使用的Slru页面淘汰机制》
SlruSelectLRUPage
// 如果全部buffer都在使用,Need to brush off one(依据是page_lru_count最小的那个)
SlruInternalWritePage
// 写一条CLOG_ZEROPAGE的XLOG
WriteZeroPageXlogRec
LWLockRelease(XactSLRULock);
}
PGXL:
void
ExtendCLOG(TransactionId newestXact)
{
int pageno;
// 由于事务IDMay apply on other nodes,Causes the current node to apply for a transactionID时,Get is a discontinuous value.
// PGThe native mechanism is continuous transactionsID申请,Cut is called every timeExtendCLOG.
// 所以这里增加latestXid,Record the last one used on the current nodeXID.
TransactionId latestXid;
// %
pageno = TransactionIdToPage(newestXact);
// Calculated last on the current nodeXIDWhere to apply
latestXid = (ClogCtl->shared->latest_page_number * CLOG_XACTS_PER_PAGE)
+ CLOG_XACTS_PER_PAGE - 1;
// If you applied last time10000,现在需要的xid5000,clogThe page is enough to return directly.
if (TransactionIdPrecedesOrEquals(newestXact, latestXid))
return;
// 走到这里说明CLOGNot enough pages,But competing scenarios need to be considered:
// Take the lock and check it again,Other processes may expand concurrently,Enough is enough and no further expansion is needed.
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
latestXid = (ClogCtl->shared->latest_page_number * CLOG_XACTS_PER_PAGE)
+ CLOG_XACTS_PER_PAGE - 1;
if (TransactionIdPrecedesOrEquals(newestXact, latestXid))
{
LWLockRelease(CLogControlLock);
return;
}
// It's really not enough,That's from the last applicationslot位置latest_page_number,连续+1backward application.
for (;;)
{
/* Zero the page and make an XLOG entry about it */
int target_pageno = ClogCtl->shared->latest_page_number + 1;
if (target_pageno > TransactionIdToPage(MaxTransactionId))
target_pageno = 0;
ZeroCLOGPage(target_pageno, true);
if (target_pageno == pageno)
break;
}
LWLockRelease(CLogControlLock);
}
边栏推荐
- Big guy, who is free to help me to see what the problem is, I just read MySQL source print, and I just came into contact with flink.
- Higher mathematics - chapter ten infinite series - constant term series
- pytest接口自动化测试框架 | Jenkins集成初探
- CC2530_ZigBee+华为云IOT:设计一套属于自己的冷链采集系统
- Install porterLB
- 力扣刷题之求两数之和
- Chrome浏览器开发新截图工具,安全浏览器截图方法
- 阿里巴巴政委体系-第六章、阿里政委体系运作
- [Azure Event Hub] Create Event Hub Consume Client + Custom Event Position with Azure AD Authentication
- Postgresql快照优化Globalvis新体系分析(性能大幅增强)
猜你喜欢
Alibaba senior experts create a learning architecture from scratch, including Alibaba's internal technology stack PPT, PFD actual combat
Mkke:为什么无法从Oracle 11g或12c升级到Oracle 23c?
【HCIP】MPLS实验
BinaryIndexedTrees树状数组
WEB 渗透之RCE
MySQL 啥时候用表锁,啥时候用行锁?这些你都应该知道吧
基于移动GIS的环保生态管理系统
基于ck+redash构建MySQL慢日志+审计日志展示平台
Protobuf Grpc使用异常 类型有未导出的方法,并且是在不同的软件包中定义
PHP基础笔记-NO.1
随机推荐
POJ 3041 Asteroids(最大匹配数=最小点覆盖)
使用安全浏览器将网页保存为pdf的方法步骤
ctfshow php特性
MySQL 啥时候用表锁,啥时候用行锁?这些你都应该知道吧
[Azure Event Hub] Create Event Hub Consume Client + Custom Event Position with Azure AD Authentication
Alibaba senior experts create a learning architecture from scratch, including Alibaba's internal technology stack PPT, PFD actual combat
InnoDB 中不同SQL语句设置的锁
云图说丨初识华为云微服务引擎CSE
VsCode预览Geojson数据
多线程和并发编程(四)
201712-3 CCF Crontab满分题解
国产虚拟化云宏CNware WinStack安装体验-5 开启集群HA
如何理解即时通讯开发移动网络的“弱”和“慢”
When does MySQL use table locks and when to use row locks?You should know this
首届MogDB征文活动开启啦!
online 方式创建索引触发trigger怎么办?
实时渲染器不止lumion,Chaos Vantage你值得一试
Rust:多线程并发编程
sys文件系统
【C语言学习笔记(六)】分支与跳转(if、else、continue、break、switch)