当前位置:网站首页>Starting from pg15 xid64 ticket skipping again
Starting from pg15 xid64 ticket skipping again
2022-06-30 21:59:00 【PostgreSQLChina】
I haven't really discussed technical issues for several days , Today we return to the discussion of a technical problem . Use PG Database and its derived database products , The biggest problem is PG Of APPEND STORE A side effect of the storage engine - Visibility problems with data copies FROZEN problem .
One night at the beginning of this month, I was woken up by a friend's phone . Since I didn't receive the list of database maintenance , I seldom wake up when the phone rings in the middle of the night . After picking up the phone, I found that it was a PG The problem of , The other party is a man who has worked for more than ten years Oracle dba, Now the database products of operation and maintenance are more and more miscellaneous , some PG Of course, they also need to deal with the problem . This is a 4 More than a TB Of PG database , Some tables often have operations of writing and modifying large quantities of data . All of a sudden, today's business reported a large number of mistakes , After inspection, it was found that FROZEN The database is not available due to the problem of . Although positioning the problem , But he observed that FREEZE It's going very slowly , At the weekend, there are still some large operations for data maintenance , It's likely that you won't be able to fully recover at the weekend . Then the business peak comes on Monday , The system is over .
Truth , about PG I am also a novice in database operation and maintenance , In recent years, I began to learn , But the good news is that FROZEN I have studied the problem . therefore , I asked him something about AUTOVACUUM and WORK MEMORY Set parameters , Their databases are all installed with default values , So I suggested that they increase maintenance_work_mem by 2GB, Set up autovacuum_work_mem by -1, At the same time, increase autovacuum_max_workers Parameters . Because their server configuration is very high , After adjusting the parameters , The problem was soon solved . If you are in a large PG This problem is encountered on the database , Similar measures can be taken to solve the problem . If your server CPU There are enough , The physical memory is large enough , You can also increase some parameters I suggest , So as to complete more quickly FREEZE.
Frankly speaking , comparison ORACLE database ,PG Database operation and maintenance is very easy . Just because they just started using PG When it comes to databases , You don't know much about the principle of database , Therefore, some parameter settings are not reasonable , When the database is small, it is not easy to have problems , After the database gets larger , Various problems were exposed .
Postgres In order to achieve MVCC It adopts the implementation of multiple tuple copies , To manage row visibility across transactions , It must assign a unique transaction to each transaction that performs the write ID(XID). When the transaction commits , this XID Stored in rows . Queries read from this table can only see transactions older than the query ID The line of . This transaction ID Use 32 position , Therefore, only about 40 A hundred million things , So every ID There must be... Before and after 20 One hundred million ID. It means , If XID Rotation occurs , This will affect the consistency of the visibility algorithm , The data that should be visible becomes invisible .
To prevent this from happening ,Postgres adopt autoavcuum The background process is doing vacuum Specify a special for all older rows XID( be called FrozenXID). All with FrozenXID All the lines of are regarded as the lines of the past , Without considering XID Surround the problem . Usually autovacuum Is able to complete the work well , But if VACUUM Work is slow , Unable to catch up with the write operation to generate a new transaction ID The speed of , Business ID May approach 20 The million mark .Postgres Through the transaction ID Insufficient 100 Shut down in case of 10000 for maintenance to avoid data loss . During this period, new data cannot be written , And it may take unknown time to return to normal .
If PG There are many large tables written frequently in the database , And no optimization autovacuum Related parameters of , that FROZEN It is difficult to avoid the problem of .
In fact, this problem has a great impact on previous years Oracle SCN HEADROOM The problem is similar .PG In the early days, you may feel 32 Bit XID That's enough , Even a discount on visibility is enough .ORACLE It was also used 32 Bit SCN, To make sure SCN Enough , It will not cause read consistency problems in the database due to wrapping , Set every second SCN The maximum increase is no more than 16K, When doing this design ,16K The number of transactions per second is large enough , At that time, it was very difficult for the server to handle 100 transactions per second . With the continuous development of hardware ,32 Bit SCN Obviously not enough , In order to solve SCN HEADROOM problem ,Oracle It took a long time , First of all, let's push forward every second SCN Quantity adjustable , And increase the default value . And completely solve this 80 The scheme of the big pit left over from the S is to put SCN Expand to 64 position .
Again , Thorough solution PG Of FROZEN The solution to the problem is to expand XID by 64 position . This problem actually starts from 10 A year ago , Even a lot PGER Thought they were PG 12 We can use 64 Bit XID 了 , At that time, I also saw a lot about PG12 Positive price XID 64 The discussion of the , But here we are. PG 14, This dream still hasn't come true . In fact, Russian postgresqlpro Has been used on the enterprise edition of 64 Bit XID, This implementation is based on their company's Alexander Korotkov One of the HACKER. According to their tests on some large load systems ,64 Bit XID Avoid during peak business hours 50% Performance degradation of . Also born in PG Of opengauss in , It has also been realized XID 64. It seems that it is not particularly difficult to solve this problem .
This time last year , One about PG 15 Will support XID 64 The news spread from the community , This is going to be Alexander Korotkov The scheme of is formally incorporated into the mainline version . But as the PG 15 The proximity of the release date , We are PG 15 The official preview of the new features does not see XID 64 The content of .
XID 64 Has become a PG One of the two biggest ticket skippers in the community , Why? XID 64 I've been skipping tickets ? This sum PG The underlying storage engine is closely related . Because in every tuple , There are XMAX、XMIN, These are all 32 Bit , If you want to expand to 64 position , Will inevitably lead to a large number of PG Code changes . Therefore, it is better to implement it with the data structure of fixed tuples XID Expansion of . from PG 64 position XID The way of implementation is Alexander Korotkov Programme , We can see that PG The insistence of core R & D in this regard : First of all, will XID It is amended as follows 64 position ; Second, disk TUPLE The structure is not modified (HeapTupleHeader structure ),XMIN/XMAX Still keep 32 position , go by the name of ShortTransactionId, Solve on the page header XID In addition to 32 Bit question ,XMIN/XMAX What's in it is XID Of OFFSET part , and PAGE HEADER Internal storage XID Of BASE; Finally, the tuple in memory is from PAGEHEADER gain XID Of BASE Part content .
according to PG Community feedback , This time it's because UPSTREAM And XID 64 The compatibility problem of the aspect cannot be solved in PG 15 Complete adaptation before release , To lead to XID 64 Miss the user again . according to PG Look at the news from the community ,XID 64 stay PG 16 I'm sure I can meet you , I hope this time PG The community should stop jumping tickets .
from XID 64 This matter , We also see the complexity of database development , If it's in a MIS In the project , Adjust the length of a key primary key , Even if the business involved is too complex , It won't take ten years . This time PG The community withdrew because of a compatibility issue with a data replication component 64-BIT XIDS, It is a very rigorous approach . Database problems can lead to application and data errors , And in the digital age , The possible consequences of data errors can be quite serious .
Even if PG The community has such a rigorous attitude , Still can not avoid some more serious BUG With the release . Recently discovered PG 14.1 To PG 14.3 Online creation of / There is a serious data error when rebuilding the index BUG That's one example . If this error occurs on a critical business system , It may cause business errors .
Very rigorous work in database development , In the digital age , Users who dare to use your database in the business system actually tie the business development prospects of the enterprise to your products , Therefore, users have to be cautious . So , I don't agree with the idea that twenty or thirty guns can be used as a general relational database . A database enterprise should really be responsible for users , There should be no less than thirty or fifty high-level testers , Otherwise, we can only treat users as testers .


This article is from WeChat official account. - Open Source Software Alliance PostgreSQL Branch (kaiyuanlianmeng).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- Neurotransmetteurs excitateurs - glutamate et santé cérébrale
- 用yml文件进行conda迁移环境时的报错小结
- Clickhouse Native Monitoring item, System table Description
- Reading notes of Clickhouse principle analysis and Application Practice (3)
- 1-1 数据库的基本概念
- Which direction should college students choose to find jobs after graduation?
- ML&DL:機器學習和深度學習中超參數優化的簡介、評估指標、過擬合現象、常用的調參優化方法之詳細攻略
- 顺祝老吴的聚会
- JD and Tencent renewed the three-year strategic cooperation agreement; The starting salary rose to 260000 yuan, and Samsung sk of South Korea scrambled for a raise to retain semiconductor talents; Fir
- Random talk about Clickhouse join
猜你喜欢

阿婆做的臭豆腐

Pytorch quantitative practice (1)

On several key issues of digital transformation

Open the jupyter notebook/lab and FAQ & settings on the remote server with the local browser

Clickhouse Native Monitoring item, System table Description

Bloom filter

WinDbg debugging tool introduction

clickhouse原生監控項,系統錶描述

Go Web 编程入门: 一探优秀测试库 GoConvey

Stinky tofu made by Grandma
随机推荐
[untitled] first time to participate in CSDN activities
[backtracking] full arrangement leetcode46
VIM common shortcut keys
Uniapp routing uni simple router
1. Summary of wechat applet page Jump methods; 2. the navigateto stack does not jump to the tenth floor
1-2 install and configure MySQL related software
Develop technology - get time 10 minutes ago
Phoenix architecture: an architect's perspective
Anaconda下安装Jupyter notebook
1-7 Path路径模块
PyTorch量化实践(2)
1-11 create online file service
Modify the name of the launched applet
Excuse me, can I open an account for the company? Is it safe? All the answers you want are here
《Dynamic Routing Between Capsules》论文学习总结
ssh 默认端口不是22时的一些问题
JD and Tencent renewed the three-year strategic cooperation agreement; The starting salary rose to 260000 yuan, and Samsung sk of South Korea scrambled for a raise to retain semiconductor talents; Fir
Stimulate new kinetic energy to develop digital economy in multiple places
jenkins下载插件下载不了,解决办法
京东与腾讯续签三年战略合作协议;起薪涨至26万元,韩国三星SK争相加薪留住半导体人才;Firefox 102 发布|极客头条