当前位置:网站首页>Distributed transaction principle and solution
Distributed transaction principle and solution
2022-06-24 10:27:00 【Juvenile deer】
Local transactions
In most scenarios , Our applications only need to operate a single database , The transaction in this case is called a local transaction (Local Transaction). Local affairs ACID The feature is that the database provides direct support . The local transaction application architecture is as follows :

stay JDBC Programming , We go through java.sql.Connection Object to open 、 Close or commit a transaction . The code is as follows :
Connection conn = ... // Get database connection
conn.setAutoCommit(false); // Open transaction
try{
//... Add, delete, modify and check sql
conn.commit(); // Commit transaction
}catch (Exception e) {
conn.rollback();// Transaction rollback
}finally{
conn.close();// Close links
}Typical scenario of distributed transaction
At present, the development of Internet is in full swing , The vast majority of companies have split and serviced their databases (SOA). under these circumstances , Completing a business function may need to span multiple services , Working with multiple databases . This involves distributed transactions , The resource to be operated is located on multiple resource servers , The application needs to ensure the operation of data for multiple resource servers , All or nothing , All or nothing . In essence , Distributed transaction is to ensure the data consistency of different resource servers .
Typical distributed transaction scenario
Cross-database transaction
Cross database transactions refer to , A certain function of an application needs to operate multiple libraries , Different business data are stored in different libraries . I have seen a relatively complex business , In a business at the same time 9 Databases . The following figure shows a service operating at the same time 2 The situation of a library :

Sub database and sub table
Generally, a database has a large amount of data, or it is expected that there will be a large amount of data in the future , Will be split horizontally , That is, sub database and sub table . Here's the picture , Will database B Split into 2 Databases :

For the case of sub database and sub table , General developers will use some database middleware to reduce sql Complexity of operation . Such as , about sql:insert into user(id,name) values (1," Zhang San "),(2," Li Si "). This article sql Is the syntax of the operation list Library , In case of single warehouse , Can guarantee the consistency of transactions .
But now because of the sub database and sub table , Developers want to 1 No. record insertion branch 1,2 No. record insertion branch 2. So database middleware should rewrite it as 2 strip sql, Insert two different sub databases , At this time, we need to ensure the success of both libraries , Or they all fail , So basically all database middleware are faced with the problem of distributed transaction .
As a service
Microservice architecture is a popular concept at present . For example, a case mentioned by the author above , An application operates at the same time 9 Databases , The application logic is very complex , It's a great challenge for developers , It should be split into separate services , To simplify business logic . After break up , Independent services through RPC Framework to make remote calls , To communicate with each other . The figure below shows a 3 The architecture that services call each other :

Service A To complete a function, you need to operate the database directly , Also call Service B and Service C, and Service B And at the same time 2 A database ,Service C Also operated a library . We need to ensure that these cross service operations on multiple databases are successful , Or they all fail , In fact, this is probably the most typical distributed transaction scenario .
Summary : In the distributed transaction scenario discussed above , Without exception, they operate multiple databases directly or indirectly . How to guarantee the ACID characteristic , For distributed transaction implementation , It's a very big challenge . meanwhile , The implementation of distributed transaction must also consider the problem of performance , If in order to guarantee strictly ACID characteristic , Resulting in a serious performance degradation , So for some businesses that require quick response , It's unacceptable .
X/Open DTP Model and XA standard
X/Open, That is, the present open group, It's an independent organization , Mainly responsible for the development of various industry technical standards . Distributed transaction processing (Distributed Transaction Processing, abbreviation DTP) for ,X/Open The following reference documents are provided :
DTP Reference model : <<Distributed Transaction Processing: Reference Model>>
DTP XA standard : << Distributed Transaction Processing: The XA Specification>>
DTP Model
constitute DTP Model 5 Two basic elements :
Applications (Application Program , abbreviation AP): Used to define transaction boundaries ( Define the beginning and end of a transaction ), And operate on resources within the transaction boundaries .
Explorer (Resource Manager, abbreviation RM): Such as a database 、 File system, etc , And provide access to resources .
Transaction manager (Transaction Manager , abbreviation TM): Responsible for assigning transaction unique identifier , Monitor the progress of transactions , And responsible for the submission of affairs 、 Roll back, etc .
Communication resource manager (Communication Resource Manager, abbreviation CRM): Control one TM Domain (TM domain) Inside or across TM Communication between distributed applications in the domain .
Communication protocol (Communication Protocol, abbreviation CP): Provide CRM Provide the underlying communication services between distributed application nodes .
XA standard
stay DTP In the local model instance , from AP、RMs and TM form , No other elements are needed .AP、RM and TM Between , We need to interact with each other , As shown in the figure below :

In this picture (1) Express AP-RM The interface of ,(2) Express AP-TM The interface of ,(3) Express RM-TM The interface of .
XA The main function of norms is , That's the definition RM-TM The interface of ,XA The norm is in addition to the definition of RM-TM Interface of interaction (XA Interface) outside , We also optimize the two-phase commit protocol .
Two-stage agreement (two-phase commit) Is in OSI TP In the standard ; stay DTP Reference model (<<Distributed Transaction Processing: Reference Model>>) in , Specifies that the commit of a global transaction uses two-phase commit agreement ; and XA standard (<< Distributed Transaction Processing: The XA Specification>>) It just defines the interface to be used in the two-phase commit protocol , That is to say RM-TM Interface of interaction , Because the participants in the two-phase submission process , Only TM and RMs.
stay XA There are two stages in the agreement :
- The transaction manager requires that each database involved in a transaction be pre committed (Precommit) This operation , And reflect whether you can submit .
- The transaction coordinator requires each database to commit data , Or roll back the data
Two phase submission agreement (2PC)
Two phase submission agreement (Two Phase Commit) Not in XA Put forward in the specification , however XA The specification optimizes it . And literally ,Two Phase Commit, That is to submit (commit) The process is divided into 2 Stages (Phase):
Stage 1:
TM Notifications RM Ready to commit their transaction branches . If RM Judge that your work can be submitted , Then make it persistent , Give again TM A positive reply ; If something else happens , Here TM All the answers are negative . After sending a negative reply and rolling back the work already done ,RM You can discard the transaction branch information .
With mysql Database, for example , In the first phase , The transaction manager issues prepare" Prepare to submit " request , After receiving the request, the database performs data modification and logging , After processing is completed, only the state of the transaction is changed to " You can submit ", Then return the result to the transaction manager .
Stage 2
TM According to the stage 1 each RM prepare Result , Decide whether to commit or roll back the transaction . If all RM all prepare success , that TM Inform all RM Submit ; If there is RM prepare Failure words , be TM Inform all RM Roll back your own transaction branch .
With mysql Database, for example , If all the databases in the first phase are prepare success , Then the transaction manager issues " Confirm the submission " request , The database server sends the transaction to " You can submit " Status changed to " Submit completed " state , Then return to answer . If an error occurs in any database operation in the first phase , The database manager did not receive a response , Transaction failure , Rollback all database transactions . The database server can't receive the second stage confirmation submission request , Will also put " You can submit " Back of business .


XA It's a resource level distributed transaction , Strong consistency , In the whole process of two-stage submission , Always hold the lock of resources .
TCC It's a business level distributed transaction , Final consistency , Will not always hold the lock of resources .
TCC( Compensation Affairs )
TCC It's a two-stage programming model for service , Every business service must implement Try,Confirm,Cancel Three methods , These three ways can correspond to SQL Transaction Lock,Commit,Rollback.
Compared to the two-phase commit ,TCC Solved several problems : Synchronous blocking , The timeout mechanism is introduced , Compensation after timeout , It doesn't lock the entire resource like a two-phase commit , Convert resources to business logic , The particle size becomes smaller .
Because of the compensation mechanism , It can be controlled by the business activity manager , Ensure data consistency .
Try Stage :Try It's just a preliminary operation , Make a preliminary confirmation , Its main responsibility is to complete all business checks , Reserve business resources .
Confirm Stage :Confirm Is in Try After the stage inspection is completed , Continue with the confirmation operation , Must satisfy idempotent operation , If Confirm Execution failed in , There will be transaction coordinators that trigger continuous execution , Until satisfied .
Cancel Cancel execution : stay Try Failed and released Try Resources reserved in the stage , It must also satisfy idempotence , Follow Confirm It's also possible to be constantly executed .
One place an order , An example of generating an order to deduct inventory :

So let's see , How to add our inventory deduction process TCC:

stay Try When , Will allow inventory services to be reserved N Stock for this order , Let the order service generate a “ Unconfirmed ” Order , These two reserved resources are generated at the same time .
stay Confirm When , Will be used in Try Reserved resources , stay TCC In the transaction mechanism, I think , If in Try Resources that can be normally reserved in the stage , So in Confirm Must be able to submit completely .

stay Try When , One side of the task failed to execute , Will perform Cancel Interface operation of , Will be in Try Release the resources reserved in the stage .
This is not the point TCC How transactions are implemented , The focus is on distributed transactions CAP+BASE The application of the theory .
tcc Transaction implementation : https://github.com/changmingxie/tcc-transaction
Two phase submission agreement (2PC) The problem is
Two phase commit does seem to provide atomic operations , But unfortunately , There are still several shortcomings in the two-stage submission :
1、 Synchronization blocking problem ( The biggest problem ).
In the two-phase commit scheme, the global transaction ACID characteristic , It depends on RM Of . A global transaction contains multiple independent transaction branches , Either this set of transaction branches is successful , Or they all fail . For each transaction branch ACID Together, features make up the ACID characteristic . That is to say, the support of a single transaction branch ACID The feature elevates a level to the category of distributed transactions . Even in local transactions , If it's sensitive to operation reading , We also need to set the transaction isolation level to SERIALIZABLE. And for distributed transactions , Even more so , The level of repeatable read isolation is not enough to guarantee the consistency of distributed transactions . If we use mysql To support XA Distributed transactions , Then it's best to set the transaction isolation level to SERIALIZABLE, However SERIALIZABLE( Serialization ) Is the highest of the four transaction isolation levels , It's also the lowest level of execution efficiency .
After the resources are ready , The resources in the resource manager are always blocked , Until the submission is complete , To release resources .
2、 A single point of failure .
Because of the importance of the coordinator , Once the coordinator TM failure , participants RM It will keep blocking . Especially in the second stage , Coordinator failed , Then all participants are still in the state of locking transaction resources , Cannot continue to complete the transaction .( If the coordinator dies , You can re elect a coordinator , But it can't solve the problem that participants are blocked due to coordinator downtime )
3、 Data inconsistency .
In phase II of phase II submission , When the coordinator sends commit After the request , There is a local network exception or sending commit The coordinator failed during the request , This will result in only a few participants receiving commit request , And in this part of the participants received commit The request is then executed commit operation , But the rest didn't receive commit The requested machine is unable to perform a transaction commit . So the whole distributed system appears the phenomenon of data inconsistency .
4. uncertainty
When the transaction manager sends commit after , And only one participant received commit, So when the participant and the transaction manager go down at the same time , The re elected transaction manager cannot determine whether the message was committed successfully .
There are synchronization blocks due to two-phase commit 、 Single point problem and other defects , therefore , The researchers improved on the two-phase commit , A three-phase commit is proposed .
Three stage submission agreement (Three-phase commit)
Three stage commit (3PC), It's a two-stage submission (2PC) Improved version .
Different from the two-stage submission is , There are two changes in the three-phase submission :
1、 Introduce timeout mechanism . At the same time, the timeout mechanism is introduced in both the coordinator and the participants .
2、 Insert a preparation stage in the first and second stages . It ensures that the states of participating nodes are consistent before the final submission stage . in other words , In addition to introducing a timeout mechanism ,3PC hold 2PC Once again, the preparation phase of the project is divided into two parts , In this way, there are three stages of submission CanCommit、PreCommit、DoCommit Three stages .

CanCommit Stage
3PC Of CanCommit The stage is actually the same as 2PC The preparation stage of is very similar to . The coordinator sends... To the participants commit request , Participants return if they can submit Yes Respond to , Otherwise return to No Respond to .
1. Business inquiry The coordinator sends... To the participants CanCommit request . Ask if the transaction commit operation can be performed . Then start waiting for the response from the participants .
2. Respond to feedback The participants received CanCommit After the request , Under normal circumstances , If it thinks it can execute the transaction smoothly , Then return to Yes Respond to , And get ready . Otherwise feedback No
PreCommit Stage
The coordinator decides whether the transaction can be remembered according to the response of the participants PreCommit operation . According to the response , There are two possibilities .
If the coordinator's feedback from all participants is Yes Respond to , Then the pre execution of the transaction will be executed .
1. Send pre submit request The coordinator sends... To the participants PreCommit request , And enter Prepared Stage .
2. Transaction pre commit Participant received PreCommit After the request , Will perform transaction operations , And will undo and redo Information is recorded in the transaction log .
3. Respond to feedback If the participant successfully performs the transaction operation , Then return to ACK Respond to , And start waiting for the final order .
If any of the participants sent No Respond to , Or wait for the timeout , None of the coordinators received a response from the participants , Then the interruption of the execution of the transaction .
1. Send interrupt request The coordinator sends... To all participants abort request .
2. Interrupt the business The participants received... From the coordinator abort After the request ( Or after the timeout , The request of the coordinator has not yet been received ), The interruption of the execution of a transaction .
doCommit Stage
In this phase, the real transaction commit , It can also be divided into the following two situations .
Case 1: Execute commit
1. Send submit request Coordinate to receive... Sent by participants ACK Respond to , Then he will go from pre submission to submission . And send it to all participants doCommit request .
2. Transaction submission Participant received doCommit After the request , Perform formal transaction submission . And release all transaction resources after transaction commit .
3. Respond to feedback After the transaction is committed , Send... To the coordinator Ack Respond to .
4. Complete the business The coordinator receives... From all participants ack After responding , Complete the business .
Case 2: Interrupt the business The coordinator did not receive the ACK Respond to ( It may be that the recipient sent it not ACK Respond to , It's also possible that the response timed out ), Then the interrupt transaction will be executed .
1. Send interrupt request The coordinator sends... To all participants abort request
2. Transaction rollback Participant received abort After the request , Take advantage of the undo Information to perform the rollback operation of the transaction , And release all transaction resources after rollback .
3. Feedback results After the participant completes the transaction rollback , Send... To the coordinator ACK news
4. Interrupt the business The coordinator received feedback from the participants ACK After message , The interruption of the execution of a transaction .
stay doCommit Stage , If the participant cannot receive the... From the coordinator in time doCommit perhaps rebort When asked , After the timeout , Transaction commit will continue .( In fact, this should be based on probability , When entering the third stage , Indicate that participants have received... In the second phase PreCommit request , Then the coordinator produces PreCommit The premise of the request is that he is , Received... From all participants CanCommit The responses are all Yes.( Once the participants have received PreCommit, It means that he knows that everyone actually agrees to modify ) therefore , In a word, it is , When entering the third stage , Due to network timeout and other reasons , Although participants did not receive commit perhaps abort Respond to , But he has reason to believe : The chances of a successful submission are great . )
2PC And 3PC The difference between
be relative to 2PC,3PC The main single point of failure to solve , And reduce congestion , Because once the participants can't receive the information from the coordinator in time , He will default to commit. Instead of holding transaction resources and blocking them all the time . But this mechanism also leads to data consistency problems , because , Because of the Internet , Sent by the coordinator abort The response was not received by the participants in time , Then the participant executes after the timeout commit operation . In this way, we will receive abort There is data inconsistency between the participants who command and perform the rollback .
I understand 2PC and 3PC after , We can find out , No matter two-phase commit or three-phase commit, it can't completely solve the problem of distributed consistency .
Local message table
The scheme of local message table was originally eBay Proposed ,eBay The whole scheme of :
https://queue.acm.org/detail.cfm?id=1394128
Local message table is the most widely used method in the industry , Its core idea is to split distributed transactions into local transactions for processing .

For local message queues , The core is to turn big business into small business , Let's use the above example to illustrate :
- When we go to create an order , We add a new local message table , Write the created order and inventory deduction to the local message table , In the same transaction ( Rely on database local transactions to ensure consistency ).
- Configure a scheduled task to poll the local transaction table , Scan this local transaction table , Send messages that have not been sent , Send to inventory service , When the inventory service receives the message , Inventory will be reduced , And write to the transaction table of the server , Update the status of the transaction table .
- The inventory server notifies the order service through scheduled tasks or directly , The order service updates the status in the local message table .
It should be noted here that , For some scanning tasks that fail to be sent , Will be resend , Therefore, the idempotency of the interface must be guaranteed . The local message queue is BASE theory , Is the final consistency model , It is applicable to the case that the requirements for consistency are not high .
RocketMQ Business
RocketMQ Distributed transaction is implemented in , It's actually an encapsulation of the local message table , Moved the local message table to MQ Inside .

Transaction message as an asynchronous assured transaction , Branch two transactions through MQ Asynchronous decoupling ,RocketMQ The design process of transaction message also draws on the two-stage commit theory .
The overall interaction process is shown in the figure below :

MQ Transactions are a layer of encapsulation of local message tables , Moved the local message table to MQ Inside , So it is also based on BASE theory , Is the ultimate consistency pattern , It is applicable to transactions that do not require strong consistency , meanwhile MQ Transactions asynchronize the entire process , It is also very suitable for high concurrency . This chapter does not explain in detail RocketMQ Business and RocketMq Characteristics of .
seata
Seata The three characters of
stay Seata In the framework of , There are three characters :
TC (Transaction Coordinator) - A business coordinator
Maintain the state of global and branch transactions , Drive global transaction commit or rollback .
TM (Transaction Manager) - Transaction manager
Define the scope of the global transaction : Start global transaction 、 Commit or roll back global transactions .
RM (Resource Manager) - Explorer
Manage resources for branch transactions , And TC Talk to register branch transactions and report the status of branch transactions , And drive branch transaction commit or rollback .
among ,TC For separately deployed Server Server side ,TM and RM For embedded in the application Client client .
stay Seata in , The life cycle of a distributed transaction is as follows :

1.TM request TC Start a global transaction .TC Will generate a XID As the number of the global transaction .XID, It will propagate in the invocation link of microservices , Ensure that the subtransactions of multiple microservices are associated together .
2.RM request TC Register local transaction as branch transaction of global transaction , Through global transactions XID Association .
3.TM request TC tell XID Whether the corresponding global transaction is committed or rolled back .
4.TC drive RM We will XID Commit or roll back the corresponding local transaction .
Design thinking
AT The core of the pattern is no intrusion into the business , It's an improved two-stage submission , The design idea is shown in the figure
The first stage
Business data and rollback logging are committed in the same local transaction , Release local locks and connection resources . The core is to the business sql To analyze , convert to undolog, And put it in storage at the same time , How is this done ? First throw out a concept DataSourceProxy Proxy data sources , Through the name, you can basically guess what operation it is , Specific analysis will be made later
Refer to official documentation : Seata AT Pattern

The second stage
Distributed transaction operation succeeded , be TC notice RM Delete asynchronously undolog

Distributed transaction operation failed ,TM towards TC Send rollback request ,RM Roger the coordinator TC Rollback request from , adopt XID and Branch ID Find the corresponding rollback log record , Generate reverse updates by rolling back records SQL And implement , To complete the rollback of the branch .

Overall execution process

Design highlights
Compared with other distributed transaction frameworks ,Seata There are several highlights of the architecture :
The application layer is based on SQL The analysis realizes automatic compensation , So as to minimize business intrusion ;
In distributed transactions TC( A business coordinator ) Independent deployment , Responsible for the registration of affairs 、 Roll back ;
Write isolation and read isolation are realized through global lock .
The problem is
Performance loss
One Update Of SQL, A global transaction is required xid obtain ( And TC Communications )、before image( analysis SQL, Query the database once )、after image( Query the database once )、insert undo log( Write a database )、before commit( And TC Communications , Judge lock conflict ), All these operations require a remote communication RPC, And it's synchronous . in addition undo log When writing blob The insertion performance of the field is also not high . Write each SQL It's going to cost so much , A rough estimate would increase 5 Times the response time .
Cost performance
For automatic compensation , All transactions need to be mirrored and persisted , But in the actual business scenario , This is the success rate , Or how many percentage of distributed transaction failures need to be rolled back ? Estimate according to the principle of 28 , in order to 20% The transaction is rolled back , Need to put 80% Increased response time for successful transactions 5 times , This cost is compared to whether it's worth letting the application develop a compensation transaction ?
Global lock
Hot data
comparison XA,Seata Although in a successful stage will release the database lock , But one stage is commit The determination of the front global lock also lengthens the occupation time of the data lock , The cost ratio is XA Of prepare How much lower needs to be tested according to the actual business scenario . The introduction of global locks enables isolation , But the problem is congestion , Reduce concurrency , Especially hot data , This problem will be more serious .
Rollback lock release time
Seata When rolling back , You need to delete the undo log, Then we can release TC Lock in memory , So if the second phase is rollback , It takes longer to release the lock .
The deadlock problem
Seata The introduction of global lock will increase the risk of deadlock , But if a deadlock occurs , Will keep retrying , Finally, wait for the global lock timeout , It's not elegant , It also extends the time of database lock possession .
Seata Is an open source distributed transaction solution , Committed to providing high-performance and easy-to-use distributed transaction services .Seata Will provide users with AT、TCC、SAGA and XA Transaction mode , Create a one-stop distributed solution for users .AT The mode is the first mode promoted by Ali , There are commercial versions on Alibaba cloud GTS(Global Transaction Service Global transaction services )
Official website :Seata
Source code : https://github.com/seata/seata
official Demo: https://github.com/seata/seata-samples
seata Supported distributed :Seata Introduction to minimalism
边栏推荐
- [EI分享] 2022年第六届船舶,海洋与海事工程国际会议(NAOME 2022)
- What are the characteristics of EDI local deployment and cloud hosting solutions?
- 2022全网最全最细的jmeter接口测试教程以及接口测试流程详解— JMeter测试计划元件(线程<用户>)
- 2. login and exit function development
- 【资源分享】2022年第五届土木,建筑与环境工程国际会议(ICCAEE 2022)
- 学习使用phpstripslashe函数去除反斜杠
- p5.js实现的炫酷交互式动画js特效
- 4.分类管理业务开发
- 5. dish management business development
- np. float32()
猜你喜欢

3. addition, deletion, modification and query of employees

Baidu online disk download has been in the process of requesting solutions

解决Deprecated: Methods with the same name as their class will not be constructors in报错方案

Uniapp implements the function of clicking to make a call

2.登陆退出功能开发

线程池的状态

Uniapp develops a wechat applet to display the map function, and click it to open Gaode or Tencent map.

Status of the thread pool

Outils de capture de paquets

Flink集群搭建以及企业级yarn集群搭建
随机推荐
uniapp开发微信小程序,显示地图功能,且点击后打开高德或腾讯地图。
学习使用php对字符串中的特殊符号进行过滤的方法
SQL sever基本数据类型详解
Practice sharing of packet capturing tool Charles
26.删除有序数组的重复项
How large and medium-sized enterprises build their own monitoring system
leetCode-面试题 16.06: 最小差
[IEEE publication] 2022 International Conference on service robots (iwosr 2022)
3. addition, deletion, modification and query of employees
Appium自动化测试基础 — 移动端测试环境搭建(一)
Common methods of thread scheduling
使用swiper左右轮播切换时,Swiper Animate的动画失效,怎么解决?
记录一下MySql update会锁定哪些范围的数据
dedecms模板文件讲解以及首页标签替换
线程的六种状态
leetCode-223: 矩形面积
numpy. logical_ and()
numpy.linspace()
JMeter接口测试工具基础— 取样器sampler(二)
Younger sister Juan takes you to learn JDBC --- 2-day sprint Day1