当前位置:网站首页>Etcd storage, watch and expiration mechanism
Etcd storage, watch and expiration mechanism
2022-07-04 12:54:00 【Zhang quandan, Foxconn quality inspector】
etcd v3 Storage ,Watch And expiration mechanism
etcd When storing data, it is mainly divided into two parts , The first part is called kvstore, This kvstore Is stored in memory , This is in memory Of kvstore, Any database needs to be indexed , therefore etcd It is in this way that the index is built in memory , The goal of this index is to do a quick search .
In addition, there will be a real database on the back end ,etcd By default blotdb To achieve the ,blotdb yes Google Open source key value database , When you want to store any data into etcd store When , It will save the index and drop the index at the same time , adopt blotdb Go and drop the plate .( The final data persistence is based on blotdb)
watchablestore What is it ? because etcd It supports monitoring , You can listen to the event of an object , These objects will be organized in watchablestore Inside .
Storage mechanism
etcd v3 store In two parts ∶ Part is the index in memory ,kvindex, Is based on Google Open source Golang Of Btree Realized , The other part is back-end storage . According to its design ,backend It can be connected to a variety of storage systems , Currently using boltdb.
boltdb It is a stand-alone transaction support system KV Storage ,etcd The business of is based on boltdb Transaction implementation of .etcd stay boltdb Stored in the key yes reversion, value yes etcd Their own key-value Combine , in other words etcd Will be in boltdb Save each version in , Thus, the multi version mechanism is realized .
reversion It mainly consists of two parts , The first part main rev, Add one for each transaction , The second part sub rev, Add one for each operation in the same transaction .
For any data reversion The concept of , It is equivalent to a version of information ,reversion It mainly consists of two parts , Part of it is main rev, Part of it is sub rev.
Let's go down get One key,-wjson Is to print out the details , There are create mode reversion, Here is a concept of version , For any etcd Objects stored inside , It has the concept of version , There is a current reversion 3.
reversion It is a value similar to self growth in the current cluster , When we make any data modification to the whole cluster , its reversion Will increase ,reversion Divided into sub and main Two parts , If you start a business , Then all write operations in this transaction can be shared mian reversion, Then the inner sub command corresponds to sub reversion.
about k8s Come on , Most of the time we use main reversion,sub reversion of no avail .
Look back k8s When the object , There is one resource version, It can be understood as optimistic lock , Actually this resource reversion Just like etcd The object in it mod reversion It's one-to-one , When the etcd When making data changes , This etcd mod revesion It's going to change .k8s When you read an object, you will use mod reversion As it is resource version.
- etcd Commands and setting options are provided to control compact, Support at the same time put Operation parameters to accurately control a key Number of historical versions .
- Memory kvindex What is preserved is key and reversion The previous mapping relationship , Used to speed up queries .
etcd Data storage process
The client needs to initiate a write request , Suppose this request is sent to follower, Finally, the request will be transferred to the consistency module , The consistency module will judge whether it is leader, If it is leader Just deal with it directly , If not, transfer it to leader.
Request arrived. leader Do some pre checks , The first quota , Namely etcd There is a configuration of data size , Can you put the request in this time , Second, speed limit , As a server, frequent access to my write operations will overwhelm me , The third is to authenticate , Fourth, I will do packet inspection , If the packet exceeds 1.5M, It won't let you write .
If the packet is too large , It repeatedly confirms , Including indexing later , Make a query , Its cost is very large , It can lead to etcd Performance plummeted . Because it needs to be confirmed many times , Need data synchronization , It will lead to very low efficiency of data synchronization , So it makes some restrictions , Let you not endlessly increase , Make the data very large , stay k8s Inside , It's hard to put yaml It's written in 1.5M The above ,yaml Not infinite growth , But there are certain restrictions .
After these pre inspections , The request will be sent to the main module called kvserver,kvserver The request received is found to be a write request , For example y=9, It will send the request to the consistency module , The consistency module implements two things , It's really just raft The implementer of the protocol , First of all , It will choose the master , The second is log replication , It will be etcd Built in memory raft log, This raft log It's actually a data structure , This raft log It's actually a data structure , It will write its own information first unstable Inside , It is recorded that I have a piece of data to write , Next, the request will be written to the local wal log, Is to write y=9, take y=9 This log of is written locally , This writing is finally to be dropped , You cannot drop the disk every time you write. This is too inefficient ,wal log The drop is actually caused by fsyn, It is equivalent to that the front end is written in buffer Inside , Finally through fsync Periodically, these events really fall on the disk .
Write wal log At the same time, it has another goroutine Go to the same message, Through a append message Send to others follower Where? ,follower After receiving the request over there , It has to do the same thing , It is equivalent to writing the write operation to wal log.
And reply after writing response, go back to leader here ,leader It was found that half of them confirmed the writing of this log , Then it thinks that this data has commit 了 , So it will update its data structure , to update match index, Every write has its own growth index.
At the same time, its request will be in apply Request the state machine to record this data . The state machine is based on mvcc modular , It is such a module of multi version concurrency control , After half confirmation, I will go to mvcc Write in it , First, in the tree index
here , Namely memory index Inside
Etcd Data consistency
There is one matchindex, stay leader This end , It will save a message , first term Everyone wrote a and b, Then it becomes the second term, In the second term Many changes have taken place in it ,wal log There is... In it a b c d e f g, Here you are 8 Change , Every change will be made in etcd Inside to maintain one's own index,leader One will be maintained here matchindex, representative leader With which index It's consistent , Here you are leader a,follower b c, The submitted log represents more than half of the confirmed logs , More than half of the clusters composed of these three people are confirmed to be the third 7 individual index, It's the current match index yes 7, The first 8 It has not been confirmed by the majority , It's not really a confirmed index, If re-election ,c Of log Self ratio leader Of commit index It's small , If c Go as new leader You may lose data .
Why record index, Is used to ensure that these people have the latest data , Before you can make new leader,c Can't do new leader Of , So pass index from leader Take a note here , Abreast of the times commit log Of index Where have you been , To ensure new leader Always include all confirmed data , Ensure the consistency of data in this way .
Watch Mechanism
etcd In addition to the above, it provides basic reading and writing functions , It also provides watch The mechanism of .
watch type
watch There are two types , One is watch Some kind of key, This is called. key watcher, One is to improve --prefix, Query all that begin with a slash through fuzzy matching key Changes , This is called. range watcher.
etcd V3 Of Watch Mechanism support Watch Something fixed key, Also support Watch A range ( Can be used to simulate the structure of the directory Watch), therefore watchGroup There are two kinds of watcher, One is key watchers, The data structure is every key Corresponding to a group of watcher, The other is range watchers, Data structure is a IntervalTree, It is convenient to find the corresponding... Through the interval watcher.
How to meet watch request
etcd How to meet watch request , That's what I said before watchablestore, It will open up a piece of memory space , To satisfy watch The needs of ,watchablestore There are two groups , One is called sync group, One is called unsync group.
When you want to get the latest data , No increase reversion, At this time, you can directly send the data in the memory to you , such watch be called sync group.
If it is with history reversion, There is no such information in the whole memory , It's going to db Loading inside , So go get it unsync group This request will be sent to unsync group, Send to unsync group after ,etcd There will be backend That is, it will start behind goroutine To synchronize data , When the data is synchronized to memory , Then I will put watch Send to sync group, Then sync the data to you .
meanwhile , Every WatchableStore There are two kinds of watcherGroup, One is synced, One is unsynced, The former means that group Of watcher The data has been synchronized , Waiting for new changes , The latter indicates that group Of watcher Data synchronization lags behind the latest changes , Still chasing .
When etcd Received... From client watch request , If the request carries revision Parameters , Compare the requested revision and
store Current revision, If greater than the current revision, Put in synced In the group , Otherwise in the unsynced Group .
meanwhile ,etcd Will start a background goroutine Keep syncing unsynced Of watcher, Then migrate it to Synced Group .
Under this mechanism ,etcd V3 Support starting from any version watch, No, V2 Of 1000 Article history event The problem of table restrictions ( Of course, this means that there is no compact Under the circumstances ).
Here are two windows , Update one by one watch, The window on the left can see put The operation of , Constant change , The window on the left keeps receiving notifications , One by one put event ,
-wjson Show the details of the current data
You can see key values It is encrypted , Record the created reversion How much is the , You can see the state of the object when it is created . This information is still stored in etcd Inside , Although this value has been changed , But it is still stored in db Inside .
You can see that these times are recorded in turn reversion The change of , If other threads change the value , that reversion Still can jump .
Sinister watch The window closes and reopens , Without any reversion, That is, notify me of new events from the current version , If there are new changes , At this time, these objects have not changed , So it has no notice .
You can also specify from 4 Version start of watch, You can see that all the changes have been sent to me
Above is etcd Multi version changes .
etcd Common operations
边栏推荐
- C语言数组
- 8个扩展子包!RecBole推出2.0!
- mm_ Cognition of struct structure
- 求解:在oracle中如何用一条语句用delete删除两个表中jack的信息
- Show recent errors only command /bin/sh failed with exit code 1
- Etcd 存储,Watch 以及过期机制
- DGraph: 大规模动态图数据集
- Two dimensional code coding theory
- [leetcode] 96 and 95 (how to calculate all legal BST)
- CANN算子:利用迭代器高效实现Tensor数据切割分块处理
猜你喜欢
runc hang 导致 Kubernetes 节点 NotReady
Read the BGP agreement in 6 minutes.
[leetcode] 96 and 95 (how to calculate all legal BST)
C language function
MDK在头文件中使用预编译器时,#ifdef 无效的问题
Detailed explanation of mt4api documentary and foreign exchange API documentary interfaces
When synchronized encounters this thing, there is a big hole, pay attention!
美团·阿里关于多模态召回的应用实践
C语言数组
【数据聚类】第四章第一节3:DBSCAN性能分析、优缺点和参数选择方法
随机推荐
Show recent errors only command /bin/sh failed with exit code 1
Valentine's Day confession code
Implementation mode and technical principle of MT4 cross platform merchandising system (API merchandising, EA merchandising, nj4x Merchandising)
n++也不靠谱
众昂矿业:为保障萤石足量供应,开源节流势在必行
Jetson TX2 configures common libraries such as tensorflow and pytoch
AI 绘画极简教程
从0到1建设智能灰度数据体系:以vivo游戏中心为例
When synchronized encounters this thing, there is a big hole, pay attention!
ISO 27001 Information Security Management System Certification
Dry goods sorting! How about the development trend of ERP in the manufacturing industry? It's enough to read this article
R语言--readr包读写数据
比量子化学方法快六个数量级,一种基于绝热状态的绝热人工神经网络方法,可加速对偶氮苯衍生物及此类分子的模拟
Simple understanding of binary search
面向个性化需求的在线云数据库混合调优系统 | SIGMOD 2022入选论文解读
从0到1建设智能灰度数据体系:以vivo游戏中心为例
Wechat video Number launches "creator traffic package"
【数据聚类】第四章第一节3:DBSCAN性能分析、优缺点和参数选择方法
mm_ Cognition of struct structure
Will the concept of "being integrated" become a new inflection point of the information and innovation industry?