当前位置:网站首页>Etcd introduction
Etcd introduction
2022-06-11 11:25:00 【Cotton wool】
original text :Etcd Introduce
Etcd
Etcd yes CoreOS be based on Raft Distributed development key-value Storage , Can be used for service discovery 、 Shared configuration and consistency assurance ( For example, database selection 、 Distributed locks, etc ).
Etcd The main function
- Basic key-value Storage
- Monitoring mechanism
- key The expiration and renewal mechanism of , For monitoring and service discovery
- atom CAS and CAD, For distributed locks and leader The election
Etcd be based on RAFT The consistency of
The method of election
- At initial start-up , The node is in follower State and set a election timeout, If not received within this time period from leader Of heartbeat, Node will launch election : Switch yourself to candidate after , To other members of the cluster follower Node send request , Ask if they elect to become leader.
- After receiving the acceptance vote from more than half of the nodes in the cluster , The node becomes leader, Start receiving save client And to other follower Node synchronization log . If there is no agreement , be candidate Randomly select a waiting interval (150ms ~ 300ms) Vote again , Get more than half of the cluster follower Accepted candidate Will become leader.
- leader Nodes depend on timing direction follower send out heartbeat To maintain its position .
- At any time, if other follower stay election timeout I haven't received anything from leader Of heartbeat, I will also switch my state to candidate And launch an election . Every successful election , new leader The term of office of (Term) It's better than before leader Your term of office is big 1.
Log copy
- At present Leader Received log from client ( Transaction request ) Then add the log to the local Log in .
- And then through heartbeat Take this Entry Sync to other Follower.
- Follower After receiving the log, record the log and send it to Leader send out ACK.
- When Leader Received most (n/2+1)Follower Of ACK Set the log as committed and append it to the local disk , Notify client and next heartbeat in Leader All will be informed Follower Store the log on your local disk .
Security
Security is a security mechanism used to ensure that each node executes the same sequence , When someone Follower At present Leader commit Log It becomes unusable , Maybe later Follower And will be elected Leader, It's new Leader May use new Log Cover what has been committed Of Log, This causes the nodes to execute different sequences ;Safety It's used to guarantee the election Leader Must include the previous committed Log The mechanism of ;
- Election security (Election Safety): Each term of office (Term) Only one... Can be elected Leader
- Leader integrity (Leader Completeness): finger Leader Log integrity , When Log In office Term1 By Commit after , Then the next term Term2、Term3… Waiting Leader Must contain the Log;Raft Use... In the election phase Term The judgment of is used to guarantee the integrity : When it's time to ask for a vote Candidate Of Term Larger or Term identical Index The bigger vote , Otherwise, reject the request .
Failure treatment
- Leader invalid : Others didn't receive heartbeat The node will launch a new election , And when Leader After recovery, it will automatically become... Due to the small number of steps follower( The log will also be updated leader Log coverage of )
- follower Node unavailable :follower Node unavailability is relatively easy to solve . Because the log content in the cluster is always from leader Node synchronization , As long as this node joins the cluster again, it will restart from leader Copy the log at the node .
- Multiple candidate: After the conflict candidate A waiting interval will be randomly selected (150ms ~ 300ms) Vote again , Get more than half of the cluster follower Accepted candidate Will become leader
wal journal
Etcd Realization raft When , Make the most of it go Language CSP Concurrency model and chan Magic , If you want to know more about it, you can look at the source code , Here is a brief analysis of its wal journal .
| entry | |||
|---|---|---|---|
| type | term | index | data |
wal Logs are binary , After analysis, it is the above data structure LogEntry. among :
- First field type, There are only two kinds , One is 0 Express Normal,1 Express ConfChange(ConfChange Express Etcd Its own configuration changes are synchronized , For example, new nodes are added ).
- The second field is term, Every term Represents the tenure of a master node , Every time the master node changes term It will change .
- The third field is index, The serial number is strictly in order , Represents the change serial number .
- The fourth field is binary data, take raft request Object's pb The whole structure is preserved .Etcd The source code has a tools/etcd-dump-logs, Can be wal journal dump View in text , Can help analyze raft agreement .
raft The protocol itself doesn't care about application data , That is to say data Part of , Consistency is all through synchronization wal Log to achieve , Each node will receive... From the master node data apply To local storage ,raft Only care about the synchronization status of the log , If local storage implements bug, For example, there is no right will data apply To local , It can also lead to inconsistent data .
Etcd v2 And v3
Etcd v2 and v3 Essentially sharing the same set raft Two independent applications of protocol code , Different interfaces , Storage is different , Data is isolated from each other . That is to say, if you go from Etcd v2 Upgrade to Etcd v3, original v2 The data can only be used v2 Interface access ,v3 The data created by the interface can only be accessed through v3 Interface access .
Etcd v2
Etcd v2 Is a pure memory implementation , Data is not written to disk in real time , The persistence mechanism is simple , Will be store Integration sequencing into json write file . Data in memory is a simple tree structure . For example, the following data is stored in Etcd The structure in is shown in the figure .
/nodes/1/name node1
/nodes/1/ip 192.168.1.1
store There's a global one in this currentIndex, Every change ,index Will add 1.
When the client calls watch Interface ( Add... To the parameter wait Parameters ) when , If the request parameter contains waitIndex, also waitIndex Less than currentIndex, From EventHistroy Query in table index Greater than or equal to waitIndex, and watch key Matching event, If there's data , Then return directly . If the history table does not contain or the request does not contain waitIndex, Put in WatchHub in , Every key There's a connection watcher list . When there is a change operation , Change generated event Will be put into the EventHistroy In the table , At the same time notify and the key dependent watcher.
Here are a few details that affect usage :
- EventHistroy There is a length limit , The longest 1000. in other words , If your client stops for a long time , And then again watch When , May and should waitIndex dependent event It has been eliminated , In this case, changes are lost .
- If the notice watcher When , There's a jam ( Every watcher Of channel Yes 100 Buffer space ),Etcd Will directly put watcher Delete , That is, it will lead to wait The requested connection was interrupted , The client needs to reconnect .
- Etcd store Each node The expiration time is saved in , Clean up through the timing mechanism .
So we can see ,Etcd v2 Some of the limitations of :
- The expiration time can only be set to each key On , If more than one key It is difficult to ensure a consistent life cycle .
- watcher Can only watch One of them key And its child nodes ( Through parameters recursive), Multiple... Cannot be performed watch.
- It's hard to get through watch Mechanism to achieve complete data synchronization ( Risk of losing changes ), So most of the current usage is through watch Know the change , And then through get Retrieve data , Not entirely dependent on watch Changes event.
Etcd v3
Etcd v3 store In two parts :
- Part is the index in memory ,kvindex, Is based on google Open source golang Of btree Realized .
- The other part is back-end storage . According to its design ,backend It can be connected to a variety of storage systems , Currently in use boltdb.boltdb It is a stand-alone transaction support system kv Storage ,Etcd The business of is based on boltdb Transaction implementation of .Etcd stay boltdb Stored in the key yes revision,value yes Etcd Their own key-value Combine , in other words Etcd Will be in boltdb Save each version in , Thus, the multi version mechanism is realized .
for instance : use etcdctl Write two records through the batch interface :
etcdctl txn <<<'
put key1 "v1"
put key2 "v2"
'
Then update these two records through the batch interface :
etcdctl txn <<<'
put key1 "v12"
put key2 "v22"
'
boltdb In fact, there are 4 Data :
rev={3 0}, key=key1, value="v1"
rev={3 1}, key=key2, value="v2"
rev={4 0}, key=key1, value="v12"
rev={4 1}, key=key2, value="v22"
revision It mainly consists of two parts :
- The first part main rev, Add one for each transaction .
- The second part sub rev, Add one for each operation in the same transaction .
Example above , For the first time main rev yes 3, The second, 4. Of course, the first problem we think of in this mechanism is the space problem , therefore Etcd Commands and setting options are provided to control compact, Support at the same time put Operation parameters to accurately control a key Number of historical versions .
I understand Etcd Disk storage of , It can be seen that if we want to start from boltdb Query data in , Must pass revision, But the client is through key To query value, therefore Etcd Of memory kvindex What is preserved is key and revision The previous mapping relationship , Used to speed up queries .
Then we'll analyze watch Implementation of mechanism .Etcd v3 Of watch Mechanism support watch Something fixed key, Also support watch A range ( Can be used to simulate the structure of the directory watch), therefore watchGroup There are two kinds of watcher, One is key watchers, The data structure is every key Corresponding to a group of watcher, The other is range watchers, Data structure is a IntervalTree( If you are not familiar with it, please refer to the link at the end of the article ), It is convenient to find the corresponding... Through the interval watcher.
meanwhile , Every WatchableStore There are two kinds of watcherGroup, One is synced, One is unsynced, The former means that group Of watcher The data has been synchronized , Waiting for new changes , The latter indicates that group Of watcher Data synchronization lags behind the latest changes , Still chasing .
When Etcd Received... From client watch request , If the request carries revision Parameters , Compare the requested revision and store Current revision, If greater than the current revision, Put in synced In the group , Otherwise in the unsynced Group . meanwhile Etcd Will start a background goroutine Keep syncing unsynced Of watcher, Then migrate it to synced Group . That is, under this mechanism ,Etcd v3 Support starting from any version watch, No, v2 Of 1000 Article history event The problem of table restrictions ( Of course, this means that there is no compact Under the circumstances ).
In addition, we mentioned earlier ,Etcd v2 When notifying clients , If the network is bad or the client reads slowly , There's a blockage , The current connection will be closed directly , The client needs to re initiate the request .Etcd v3 To solve this problem , A push blocked watcher queue , In the other goroutine Try again in the library .
Etcd v3 The expiration mechanism has also been improved , The expiration time is set at lease On , then key and lease relation . In this way, multiple key Relate to the same lease id, It is convenient to set the uniform expiration time , And the realization of batch renewal .
comparison Etcd v2, Etcd v3 Some major changes in :
- Interface by grpc Provide rpc Interface , To give up v2 Of http Interface . The advantage is that the efficiency of long connection is significantly improved , The disadvantage is that it is not as convenient to use as before , Especially for scenes that are inconvenient to maintain long connections .
- Abandoned the original directory structure , Become pure kv, Users can simulate directories through prefix matching patterns .
- No longer saved in memory value, The same memory can support storing more key.
- watch The mechanism is more stable , Basically through watch Mechanism to achieve complete synchronization of data .
- It provides batch operation and transaction mechanism , The user can realize through batch transaction request Etcd v2 Of CAS Mechanism ( Batch transaction support if conditional ).
Reference resources :https://feisky.gitbooks.io/kubernetes/content/components/etcd.html
边栏推荐
- 使用Yolov3训练自己制作数据集,快速上手
- 不做伪工作者
- Application of volatile in single chip microcomputer
- SpingBoot+Quartrz生产环境的应用支持分布式、自定义corn、反射执行多任务
- Count the top k strings with the most occurrences
- 【C语言】anonymous/unnamed struct&&union
- 小 P 周刊 Vol.08
- The complete manual of the strongest Flink operator is a good choice for the interview~
- 灵动边栏(Widget)插件:MO Widgets
- js合并两个对象(面试题)
猜你喜欢

js面试题---箭头函数,find和filter some和every

想做钢铁侠?听说很多大佬都是用它入门的

Appearance mode -- it has been used in various packages for a long time!

985 University doctors became popular because of their thanks in classical Chinese! The tutor commented that he not only wrote well in sci

适配器模式--能不能好好说话?

Shi Yigong: I was not interested in research until I graduated from my doctor's degree! I'm confused about the future, and I don't know what to do in the future

MySQL optimized learning diary 10 - locking mechanism

Inventory of the 9 most famous work task management software at home and abroad

Surrounddepth: self supervised multi camera look around depth estimation

命令模式--进攻,秘密武器
随机推荐
使用Yolov5训练自己制作的数据集,快速上手
设置默认收货地址【项目 商城】
WordPress重新生成特色图像插件:Regenerate Thumbnails
my.cnf中 [mysql]与[mysqld] 的区别 引起的binlog启动失败的问题
Display of receiving address list 【 project mall 】
在毕设中学习03
Etcd介绍
WordPress登录页面定制插件推荐
Source code construction of digital collection system
测试cos-html-cache静态缓存插件
NFT digital collection system platform construction
Inventory of the 9 most famous work task management software at home and abroad
WordPress site link modification plug-in: Velvet Blues update URLs
WordPress user name modification plug-in: username changer
使用Labelimg制作VOC数据集或yolo数据集的入门方法
Shi Yigong: I was not interested in research until I graduated from my doctor's degree! I'm confused about the future, and I don't know what to do in the future
Electron desktop development (development of an alarm clock [End])
34. find the first and last positions of elements in the sorted array ●●
Bark – 自己给自己的 iPhone 发推送提醒 – 最简单的推送提醒服务,开源免费
AcWing 1944. 记录保存(哈希,STL)