当前位置:网站首页>Etcd raft Based Consistency assurance
Etcd raft Based Consistency assurance
2022-07-03 20:38:00 【Zhang quandan, Foxconn quality inspector】
Etcd be based on Raft The consistency of
raft Itself is a guiding principle ,etcd Strictly follow this guiding principle , Did go Implementation of language version .
etcd Many features are actually learning raft Features of the protocol .
The method of election
- At initial start-up , The node is in Follower State and set a election timeout, If not received within this time period from Leader Of heartbeat, Node will launch election ∶ Switch yourself to candidate after , To other members of the cluster Follower Node send request , Ask if they elect to become Leader.
- After receiving the acceptance vote from more than half of the nodes in the cluster , The node becomes Leader, Start receiving save client And to other Follower Node synchronization log .
- If there is no agreement , be candidate Randomly select a waiting interval (150ms~300ms) Vote again , Get more than half of the cluster Follower Accepted candidate Will become Leader, For a new term Leader It's older than the old term leader With greater power .
As you can see from the above, the election was strictly followed raft agreement .
The method of election
- Leader Nodes depend on timing direction Follower send out heartbeat To maintain its position .
- At any time, if other Follower stay election timeout I haven't received anything from Leader Of heartbeat,, I will also switch my state to candidate And launch an election . Every successful election , new Leader The term of office of (Term) It's better than before Leader Your term of office is big 1.
Log copy
When connected Leader Received log from client ( Transaction request ) Then add the log to the local Log in , And then through heartbeat Take this Entry Sync to other Follower,Follower After receiving the log, record the log and send it to Leader send out ACK, When Leader Received most (n/2+1)Follower Of ACK Set the log as committed and append it to the local disk , Notify client and next heartbeat in Leader All will be informed Follower Store the log on your local disk .
Any data writing must go through leader, You can send the request to follower The above to , however follower Accept this request , It will forward this request in the consistency module to leader Give Way leader To deal with .leader After receiving the consent of more than half of the people, it is considered that this writing is confirmed .
Security
Security ∶ It is a security mechanism to ensure that each node executes the same sequence .
When someone Follower At present Leader commit Log It becomes unusable , Maybe later Follower And will be elected Leader, It's new Leader May use new Log Cover what has been committed Of Log, This will cause nodes to execute different sequences ;Safety It's used to guarantee the election Leader Must include the previous committed LOg The mechanism of .
Suppose there is one in the cluster follower I fell behind for a while , such as leader Has written 10 Logs , however follower Only 8 strip , The next moment leader There may be a problem , Out of this cluster , Left behind follower You can vote , But it's commit log Bizhu leader Less 2 individual , There will be a problem , If it becomes new leader Lose the data , other candidate When voting, we have to see whether there is leader, When receiving the voting request , People are better , First come, first serve , It will also check whether you are qualified to be leader, Is your data consistent with my current data , If you lag behind me, you can't vote .
So there's leader commitlog Used to record the previous term leader The log has been confirmed index, If one candidate Come and canvass , But it's log Less than leader commitlog, Then it is not qualified to do new leader Of , To prevent data loss .
Election security (Election Safety)∶
Each term of office (Term) Only one... Can be elected Leader, If the votes are equal , Then a new vote is needed .
Leader integrity (Leader Completeness)∶
finger Leader Log integrity , When Log In office Term1 By Commit after , Then the next term Term2、Term3... Waiting Leader Must contain the Log;Raft Use... In the election phase Term The judgment of uses the stem to ensure the integrity . When it's time to ask for a vote Candidate Of Term Larger or Term identical Index The bigger vote , Otherwise, reject the request .
You have many elections , There are different terms of office leader When , new leader Of commit log It must be the most complete , It should include all previous terms commitlog, How is this guaranteed ? If one candidate Of commit log, Lower than the current leader Of commit log, It has no way to make new leader, Through this mechanism, new leader Always include the previous complete log , This ensures the integrity of the data .
Failure treatment
1.Leader invalid ∶ Others didn't receive heartbeat The node will launch a new election , And when Leader After recovery, due to stepping
A few hours will automatically become Follower( The log will also be updated Leader Log coverage of ).
Suppose a leader Because the brain crack was separated , Then others may be re elected , Chose a new leader, When leader Rejoin the cluster , It will see that my label and tenure are smaller than others , So it will automatically be downgraded to follower, Its log will also be new leader The log of is overwritten .
2.Follower Node unavailable ∶Follower Node unavailability is relatively easy to solve . Because the log content in the cluster starts to
Finally from Leader Node synchronization , As long as this node joins the cluster again, it will restart from Leader Copy the log at the node .
follower After recovery , stay leader When sending heartbeat, you can take out the difference of data .
3. Multiple candidate∶ After the conflict candidate A waiting interval will be randomly selected (150ms~300ms) Launch again
vote , Get more than half of the cluster Follower Accepted candidate Will become Leader.
If it is an even number of clusters , The vote becomes 2:2 了 , We don't recommend this .
wal journal (write ahead log)
wal Logs are binary , After analysis, it is the above data structure LogEntry.
- The first of these fields type, One is 0 Express Normal,1 Express ConfChange(ConfChange Express etcd Its own configuration changes are synchronized , For example, new nodes are added ).
- The second field is term, Every term Represents the tenure of a master node , Every time the master node changes term It will change .
- The third field is index, The serial number is strictly in order , Represents the change serial number .
- The fourth field is binary data, take raft request Object's pb The whole structure is preserved .
When data is written ,etcd follow raft agreement , First, you should write a log , Write again db, Persistent storage is the final state , Before that, write a log, This log be called wal log, Just go straight ahead append Such a log .
This log is a binary file , It parses out a data structure , It's a logentry,logentry There are several important fields , The first is type , It means what log this is , For example, some configuration change logs ( Addition and subtraction node ),normal The log represents the writing of data .
The second is the term of office of the master node , That is to say leader The node is the number of terms .
The third is index, It is a sequence number that increases in order , Every change you make , Each data write will add 1. Its function is to record leader Of commit id Of , That is, the data structure will be maintained leader Of commit log id Of , Know which label to write now , We all know what data these labels correspond to .
The fourth field is data, Is to send the whole request , For example, you need to write a key value pair key=value, It treats the entire request as data preserved .
Above is wal log One of the contents of .
- etcd The source code has a tools/etcd-dump-logs, Can be wal journal dump View in text , Can help analyze Raft agreement .(wal log Itself is binary , It's not text , There is no way to read )
- Raft The protocol itself doesn't care about application data , That is to say data Part of , Consistency is all through synchronization wal Log to achieve , Each node will receive... From the master node data apply To local storage ,Raft Only care about the synchronization status of the log , If local storage implements bug, For example, you don't correctly put data apply To local , It can also lead to inconsistent data .
raft The protocol itself does not care about data , It doesn't matter how your data is stored , It just ensures the consistency of data , So how the data is stored depends entirely on etcd To achieve .
边栏推荐
- App compliance
- 强化學習-學習筆記1 | 基礎概念
- Microservice knowledge sorting - search technology and automatic deployment technology
- [raid] [simple DP] mine excavation
- Global and Chinese market of charity software 2022-2028: Research Report on technology, participants, trends, market size and share
- How to do Taobao full screen rotation code? Taobao rotation tmall full screen rotation code
- About callback function and hook function
- Measurement fitting based on Halcon learning -- Practice [1]
- Exercises of function recursion
- 2022 high voltage electrician examination and high voltage electrician reexamination examination
猜你喜欢
How to do Taobao full screen rotation code? Taobao rotation tmall full screen rotation code
2022 low voltage electrician examination and low voltage electrician simulation examination question bank
In 2021, the global revenue of syphilis rapid detection kits was about US $608.1 million, and it is expected to reach US $712.9 million in 2028
C 10 new feature [caller parameter expression] solves my confusion seven years ago
Viewing Chinese science and technology from the Winter Olympics (II): when snowmaking breakthrough is in progress
Derivation of decision tree theory
Shortest path problem of graph theory (acwing template)
Apprentissage intensif - notes d'apprentissage 1 | concepts de base
jvm jni 及 pvm pybind11 大批量数据传输及优化
JMeter plug-in installation
随机推荐
How to handle wechat circle of friends marketing activities and share production and release skills
Kubernetes 通信异常网络故障 解决思路
Reinforcement learning - learning notes 1 | basic concepts
Class loading process
Gauss elimination solves linear equations (floating-point Gauss elimination template)
Global and Chinese market of rubidium standard 2022-2028: Research Report on technology, participants, trends, market size and share
Commands related to files and directories
Node MySQL serialize cannot rollback transactions
jvm jni 及 pvm pybind11 大批量数据传输及优化
浅议.NET遗留应用改造
Test access criteria
Basic command of IP address configuration ---ip V4
Plan for the first half of 2022 -- pass the PMP Exam
MySQL learning notes - single table query
Cannot load driver class: com. mysql. cj. jdbc. Driver
AcWing 1460. Where am i?
【leetcode】1027. Longest arithmetic sequence (dynamic programming)
Line segment tree blue book explanation + classic example acwing 1275 Maximum number
Global and Chinese markets of lithium chloride 2022-2028: Research Report on technology, participants, trends, market size and share
2022 melting welding and thermal cutting examination materials and free melting welding and thermal cutting examination questions