当前位置:网站首页>An in-depth analysis of the election mechanism in kubernetes
An in-depth analysis of the election mechanism in kubernetes
2022-06-29 22:38:00 【Hermokrates】
Overview
stay Kubernetes Of kube-controller-manager , kube-scheduler, And the use of Operator The underlying implementation of controller-rumtime Both support the... In the highly available system leader The election , This article will understand controller-rumtime ( The underlying implementation is client-go) Medium leader Election to kubernetes controller How is it implemented .
Background
Running kube-controller-manager when , Yes, there are some parameters provided to cm Conduct leader Used in elections , Please refer to the official documents Parameters To understand the relevant parameters .
--leader-elect Default: true
--leader-elect-renew-deadline duration Default: 10s
--leader-elect-resource-lock string Default: "leases"
--leader-elect-resource-name string Default: "kube-controller-manager"
--leader-elect-resource-namespace string Default: "kube-system"
--leader-elect-retry-period duration Default: 2s
...
I think the election action of these components is passed etcd On going , But the back is right controller-runtime When studying , It is found that its related... Is not configured etcd Related parameters , This has aroused curiosity about the electoral mechanism . With this curiosity, I searched for something about kubernetes The election of , This is what the official website introduces , The following is a popular summary of the official instructions .simple leader election with kubernetes
Read the article to learn ,kubernetes API It provides an election mechanism , As long as the container running in the cluster , Can achieve the election function .
Kubernetes API Complete the election action by providing two properties
- ResourceVersions: Every API Object is the only one ResourceVersion
- Annotations: Every API Objects can be applied to these key Annotate
notes : Such elections will increase APIServer The pressure of the . That's right etcd It will have an impact
So with this information , Let's take a look , stay Kubernetes In the cluster , Who is the cm Of leader( The cluster we provide has only one node , So this node is leader)
stay Kubernetes All are enabled in leader The election service will generate a EndPoint , In this EndPoint There will be the above mentioned label(Annotations) To identify who is leader.
$ kubectl get ep -n kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 3d4h
kube-dns 3d4h
kube-scheduler <none> 3d4h
Here we use kube-controller-manager For example , Take a look at this EndPoint Any information
[[email protected] ~]# kubectl describe ep kube-controller-manager -n kube-system
Name: kube-controller-manager
Namespace: kube-system
Labels: <none>
Annotations: control-plane.alpha.kubernetes.io/leader:
{
"holderIdentity":"master-machine_06730140-a503-487d-850b-1fe1619f1fe1","leaseDurationSeconds":15,"acquireTime":"2022-06-27T15:30:46Z","re...
Subsets:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LeaderElection 2d22h kube-controller-manager master-machine_76aabcb5-49ff-45ff-bd18-4afa61fbc5af became leader
Normal LeaderElection 9m kube-controller-manager master-machine_06730140-a503-487d-850b-1fe1619f1fe1 became leader
It can be seen that Annotations: control-plane.alpha.kubernetes.io/leader: Which is marked node yes leader.
election in controller-runtime
controller-runtime of leader The part of the election is pkg/leaderelection below , in total 100 Line code , Let's see what we've done ?
You can see , Only some options for creating resource locks are provided here
type Options struct {
// stay manager Startup time , Decide whether to hold an election
LeaderElection bool
// Use that resource lock The default is rent lease
LeaderElectionResourceLock string
// The namespace where the election takes place
LeaderElectionNamespace string
// This attribute will decide to hold leader The name of the lock resource
LeaderElectionID string
}
adopt NewResourceLock You can see , This is the way to go client-go/tools/leaderelection below , And this leaderelection There is also a example To learn how to use it .
adopt example You can see , The entrance to the election is a RunOrDie() Function of
// Here we have one lease lock , The comments say that they are willing to exist in the cluster lease Less monitoring
lock := &resourcelock.LeaseLock{
LeaseMeta: metav1.ObjectMeta{
Name: leaseLockName,
Namespace: leaseLockNamespace,
},
Client: client.CoordinationV1(),
LockConfig: resourcelock.ResourceLockConfig{
Identity: id,
},
}
// Start the election cycle
leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
Lock: lock,
// Here you must ensure that the lease you own is invoked cancel() Terminate before , Otherwise there will still be one loop Running
ReleaseOnCancel: true,
LeaseDuration: 60 * time.Second,
RenewDeadline: 15 * time.Second,
RetryPeriod: 5 * time.Second,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(ctx context.Context) {
// Fill in your code here ,
// usually put your code
run(ctx)
},
OnStoppedLeading: func() {
// Clean up your... Here lease
klog.Infof("leader lost: %s", id)
os.Exit(0)
},
OnNewLeader: func(identity string) {
// we're notified when new leader elected
if identity == id {
// I just got the lock
return
}
klog.Infof("new leader elected: %s", identity)
},
},
})
Come here , We learned about the concept of a lock and how to start a lock , Look at the below ,client-go All provide those locks .
In the code tools/leaderelection/resourcelock/interface.go Defines a lock abstraction ,interface Provides a common interface , For locking leader Resources used in elections .
type Interface interface {
// Get Return to the election record
Get(ctx context.Context) (*LeaderElectionRecord, []byte, error)
// Create Create a LeaderElectionRecord
Create(ctx context.Context, ler LeaderElectionRecord) error
// Update will update and existing LeaderElectionRecord
Update(ctx context.Context, ler LeaderElectionRecord) error
// RecordEvent is used to record events
RecordEvent(string)
// Identity Returns the identification of the lock
Identity() string
// Describe is used to convert details on current resource lock into a string
Describe() string
}
So the implementation of this abstract interface is , Implemented resource lock , We can see ,client-go Four kinds of resource locks are provided
- leaselock
- configmaplock
- multilock
- endpointlock
leaselock
Lease yes kubernetes Control the passage in the plane ETCD To achieve a Leases Resources for , Mainly to provide a control mechanism for distributed lease . Relevant to this API The description of can be referred to :Lease .
stay Kubernetes In the cluster , We can use the following command to view the corresponding lease
$ kubectl get leases -A
NAMESPACE NAME HOLDER AGE
kube-node-lease master-machine master-machine 3d19h
kube-system kube-controller-manager master-machine_06730140-a503-487d-850b-1fe1619f1fe1 3d19h
kube-system kube-scheduler master-machine_1724e2d9-c19c-48d7-ae47-ee4217b27073 3d19h
$ kubectl describe leases kube-controller-manager -n kube-system
Name: kube-controller-manager
Namespace: kube-system
Labels: <none>
Annotations: <none>
API Version: coordination.k8s.io/v1
Kind: Lease
Metadata:
Creation Timestamp: 2022-06-24T11:01:51Z
Managed Fields:
API Version: coordination.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:acquireTime:
f:holderIdentity:
f:leaseDurationSeconds:
f:leaseTransitions:
f:renewTime:
Manager: kube-controller-manager
Operation: Update
Time: 2022-06-24T11:01:51Z
Resource Version: 56012
Self Link: /apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager
UID: 851a32d2-25dc-49b6-a3f7-7a76f152f071
Spec:
Acquire Time: 2022-06-27T15:30:46.000000Z
Holder Identity: master-machine_06730140-a503-487d-850b-1fe1619f1fe1
Lease Duration Seconds: 15
Lease Transitions: 2
Renew Time: 2022-06-28T06:09:26.837773Z
Events: <none>
Let's take a look leaselock The implementation of the ,leaselock Will implement the abstraction as a resource lock
type LeaseLock struct {
// LeaseMeta Is an attribute similar to other resource types , contain name ns And other things about lease Properties of
LeaseMeta metav1.ObjectMeta
Client coordinationv1client.LeasesGetter // Client Is to provide informer The function of
// lockconfig Including the above through describe What you see Identity And recoder Used to record changes to resource locks
LockConfig ResourceLockConfig
// lease Namely API Medium Lease resources , You can refer to the above API Use
lease *coordinationv1.Lease
}
Let's take a look leaselock Implemented those methods ?
Get
Get It's from spec Returns the record of the election
func (ll *LeaseLock) Get(ctx context.Context) (*LeaderElectionRecord, []byte, error) {
var err error
ll.lease, err = ll.Client.Leases(ll.LeaseMeta.Namespace).Get(ctx, ll.LeaseMeta.Name, metav1.GetOptions{
})
if err != nil {
return nil, nil, err
}
record := LeaseSpecToLeaderElectionRecord(&ll.lease.Spec)
recordByte, err := json.Marshal(*record)
if err != nil {
return nil, nil, err
}
return record, recordByte, nil
}
// It can be seen that this resource is returned spec The values filled in
func LeaseSpecToLeaderElectionRecord(spec *coordinationv1.LeaseSpec) *LeaderElectionRecord {
var r LeaderElectionRecord
if spec.HolderIdentity != nil {
r.HolderIdentity = *spec.HolderIdentity
}
if spec.LeaseDurationSeconds != nil {
r.LeaseDurationSeconds = int(*spec.LeaseDurationSeconds)
}
if spec.LeaseTransitions != nil {
r.LeaderTransitions = int(*spec.LeaseTransitions)
}
if spec.AcquireTime != nil {
r.AcquireTime = metav1.Time{
spec.AcquireTime.Time}
}
if spec.RenewTime != nil {
r.RenewTime = metav1.Time{
spec.RenewTime.Time}
}
return &r
}
Create
Create Is in kubernetes Try to create a lease in the cluster , You can see ,Client Namely API Of the corresponding resources provided REST client , The result will be Kubernetes Create this in the cluster Lease
func (ll *LeaseLock) Create(ctx context.Context, ler LeaderElectionRecord) error {
var err error
ll.lease, err = ll.Client.Leases(ll.LeaseMeta.Namespace).Create(ctx, &coordinationv1.Lease{
ObjectMeta: metav1.ObjectMeta{
Name: ll.LeaseMeta.Name,
Namespace: ll.LeaseMeta.Namespace,
},
Spec: LeaderElectionRecordToLeaseSpec(&ler),
}, metav1.CreateOptions{
})
return err
}
Update
Update Is an update Lease Of spec
func (ll *LeaseLock) Update(ctx context.Context, ler LeaderElectionRecord) error {
if ll.lease == nil {
return errors.New("lease not initialized, call get or create first")
}
ll.lease.Spec = LeaderElectionRecordToLeaseSpec(&ler)
lease, err := ll.Client.Leases(ll.LeaseMeta.Namespace).Update(ctx, ll.lease, metav1.UpdateOptions{
})
if err != nil {
return err
}
ll.lease = lease
return nil
}
RecordEvent
RecordEvent It is a record of events occurring during the election , Now let's go back to the previous part stay kubernetes View in cluster ep Information can be seen when event in became leader Events , Here is what will happen event Add to meta-data in .
func (ll *LeaseLock) RecordEvent(s string) {
if ll.LockConfig.EventRecorder == nil {
return
}
events := fmt.Sprintf("%v %v", ll.LockConfig.Identity, s)
subject := &coordinationv1.Lease{ObjectMeta: ll.lease.ObjectMeta}
// Populate the type meta, so we don't have to get it from the schema
subject.Kind = "Lease"
subject.APIVersion = coordinationv1.SchemeGroupVersion.String()
ll.LockConfig.EventRecorder.Eventf(subject, corev1.EventTypeNormal, "LeaderElection", events)
}
Here we have a general understanding of what a resource lock is , Other kinds of resource locks are implemented in the same way , There is not much more to be said here ; Let's take a look at the election process .
election workflow
The code entry of the election is in leaderelection.go , Here will continue the above example Analyze the whole election process downward .
As we saw earlier, the entrance to the election is a RunOrDie() Function of , So continue to learn from here . Get into RunOrDie, Actually, there are only a few lines , I got a general understanding of RunOrDie The client that will use the provided configuration to start the election , Then it will block , until ctx sign out , Or stop holding leader Lease of .
func RunOrDie(ctx context.Context, lec LeaderElectionConfig) {
le, err := NewLeaderElector(lec)
if err != nil {
panic(err)
}
if lec.WatchDog != nil {
lec.WatchDog.SetLeaderElection(le)
}
le.Run(ctx)
}
Look at the below NewLeaderElector What did you do ? You can see ,LeaderElector It's a structure , This is just creating him , This structure provides everything we need in the election (LeaderElector Namely RunOrDie Created election client ).
func NewLeaderElector(lec LeaderElectionConfig) (*LeaderElector, error) {
if lec.LeaseDuration <= lec.RenewDeadline {
return nil, fmt.Errorf("leaseDuration must be greater than renewDeadline")
}
if lec.RenewDeadline <= time.Duration(JitterFactor*float64(lec.RetryPeriod)) {
return nil, fmt.Errorf("renewDeadline must be greater than retryPeriod*JitterFactor")
}
if lec.LeaseDuration < 1 {
return nil, fmt.Errorf("leaseDuration must be greater than zero")
}
if lec.RenewDeadline < 1 {
return nil, fmt.Errorf("renewDeadline must be greater than zero")
}
if lec.RetryPeriod < 1 {
return nil, fmt.Errorf("retryPeriod must be greater than zero")
}
if lec.Callbacks.OnStartedLeading == nil {
return nil, fmt.Errorf("OnStartedLeading callback must not be nil")
}
if lec.Callbacks.OnStoppedLeading == nil {
return nil, fmt.Errorf("OnStoppedLeading callback must not be nil")
}
if lec.Lock == nil {
return nil, fmt.Errorf("Lock must not be nil.")
}
le := LeaderElector{
config: lec,
clock: clock.RealClock{
},
metrics: globalMetricsFactory.newLeaderMetrics(),
}
le.metrics.leaderOff(le.config.Name)
return &le, nil
}
LeaderElector Is the established election client ,
type LeaderElector struct {
config LeaderElectionConfig // The configuration of this , Contains some time parameters , health examination
// recoder Related properties
observedRecord rl.LeaderElectionRecord
observedRawRecord []byte
observedTime time.Time
// used to implement OnNewLeader(), may lag slightly from the
// value observedRecord.HolderIdentity if the transition has
// not yet been reported.
reportedLeader string
// clock is wrapper around time to allow for less flaky testing
clock clock.Clock
// lock observedRecord
observedRecordLock sync.Mutex
metrics leaderMetricsAdapter
}
You can see Run The implemented election logic is passed in when initializing the client Three callback
func (le *LeaderElector) Run(ctx context.Context) {
defer runtime.HandleCrash()
defer func() {
// Execute... On exit callbacke Of OnStoppedLeading
le.config.Callbacks.OnStoppedLeading()
}()
if !le.acquire(ctx) {
return
}
ctx, cancel := context.WithCancel(ctx)
defer cancel()
go le.config.Callbacks.OnStartedLeading(ctx) // At election , perform OnStartedLeading
le.renew(ctx)
}
stay Run Called in acquire, This is Through one loop To call tryAcquireOrRenew, until ctx Pass on the end signal
func (le *LeaderElector) acquire(ctx context.Context) bool {
ctx, cancel := context.WithCancel(ctx)
defer cancel()
succeeded := false
desc := le.config.Lock.Describe()
klog.Infof("attempting to acquire leader lease %v...", desc)
// jitterUntil Is a function that executes timing func() Is the logic of timed tasks
// RetryPeriod Is the cycle interval
// JitterFactor Is the retry factor , It is similar to the coefficient in the delay queue (duration + maxFactor * duration)
// sliding Whether logic is calculated in time
// Context passing
wait.JitterUntil(func() {
succeeded = le.tryAcquireOrRenew(ctx)
le.maybeReportTransition()
if !succeeded {
klog.V(4).Infof("failed to acquire lease %v", desc)
return
}
le.config.Lock.RecordEvent("became leader")
le.metrics.leaderOn(le.config.Name)
klog.Infof("successfully acquired lease %v", desc)
cancel()
}, le.config.RetryPeriod, JitterFactor, true, ctx.Done())
return succeeded
}
In fact, the election action here is tryAcquireOrRenew in , Let's take a look tryAcquireOrRenew;tryAcquireOrRenew Is trying to get one leader lease , If you have already obtained , Renew the lease ; Otherwise, the lease can be obtained true, conversely false
func (le *LeaderElector) tryAcquireOrRenew(ctx context.Context) bool {
now := metav1.Now() // Time
leaderElectionRecord := rl.LeaderElectionRecord{
// Build an election record
HolderIdentity: le.config.Lock.Identity(), // The identity of the elector ,ep Related to host name
LeaseDurationSeconds: int(le.config.LeaseDuration / time.Second), // Default 15s
RenewTime: now, // Recapture time
AcquireTime: now, // For the time
}
// 1. from API Gets or creates a recode, If you can get it, you already have a lease , Instead, create a new lease
oldLeaderElectionRecord, oldLeaderElectionRawRecord, err := le.config.Lock.Get(ctx)
if err != nil {
if !errors.IsNotFound(err) {
klog.Errorf("error retrieving resource lock %v: %v", le.config.Lock.Describe(), err)
return false
}
// The action of creating a lease is to create a corresponding resource, This lock Namely leaderelection Four types of locks provided ,
// Look at you runOrDie What lock is passed in the initialization of
if err = le.config.Lock.Create(ctx, leaderElectionRecord); err != nil {
klog.Errorf("error initially creating leader election record: %v", err)
return false
}
// Here we have obtained or created the lease , Then record some of its properties ,LeaderElectionRecord
le.setObservedRecord(&leaderElectionRecord)
return true
}
// 2. Get records to check identity and time
if !bytes.Equal(le.observedRawRecord, oldLeaderElectionRawRecord) {
le.setObservedRecord(oldLeaderElectionRecord)
le.observedRawRecord = oldLeaderElectionRawRecord
}
if len(oldLeaderElectionRecord.HolderIdentity) > 0 &&
le.observedTime.Add(le.config.LeaseDuration).After(now.Time) &&
!le.IsLeader() {
// No leader, Conduct HolderIdentity Compare , Plus time , It's not time to run , Jump out of
klog.V(4).Infof("lock is held by %v and has not yet expired", oldLeaderElectionRecord.HolderIdentity)
return false
}
// 3. We will try to update . ad locum leaderElectionRecord Set to default . Let's correct it before updating .
if le.IsLeader() {
// That means leader, Fix his time
leaderElectionRecord.AcquireTime = oldLeaderElectionRecord.AcquireTime
leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions
} else {
// LeaderTransitions Is refers to leader adjustment ( Change to other ) Several times , If it is ,
// It is a change , Keep the original value
// conversely , be +1
leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions + 1
}
// Update when done APIServer Lock resources in , That is, update the attribute information of the corresponding resource
if err = le.config.Lock.Update(ctx, leaderElectionRecord); err != nil {
klog.Errorf("Failed to update lock: %v", err)
return false
}
// setObservedRecord Through a new record To update the lock record
// Operation is safe , Locking ensures that the critical area can be locked by only one thread / Process operation
le.setObservedRecord(&leaderElectionRecord)
return true
}
summary
Come here , Already fully aware of the use of kubernetes What is the process of the election ; Let's briefly review , Above leader Electing all the steps :
- The preferred service to be created is that of the service leader, The lock can be
lease,endpointAnd other resources - It's already leader Instances of will continue to be leased , The default value for leases is 15 second (
leaseDuration);leader Update the lease time when the lease expires (renewTime). - Other follower, Continuously check the existence of the corresponding resource lock , If there is already leader, Then check
renewTime, If the rental time is exceeded (), It means leader There is a problem and the election needs to be restarted , Until there is follower Upgrade to leader. - In order to avoid resources being preempted ,Kubernetes API Used
ResourceVersionTo avoid being repeatedly modified ( If the version number is inconsistent with the requested version number , It means that it has been modified , that APIServer Will return an error )
Reference
Kubernetes Implementation principle of concurrency control and data consistency
边栏推荐
- Nacos-配置中心基本使用
- leetcode:91. Decoding method [DFS + memorization]
- Grep tool
- 从第三次技术革命看企业应用三大开发趋势
- Processing of error b6267342 reported by AIX small machine in production environment
- mysql备份数据库linux
- Three development trends of enterprise application viewed from the third technological revolution
- The client can connect to remote MySQL
- 关于深度学习的概念理解(笔记)
- Can the flick CDC be used for incremental synchronization from Oracle to MySQL
猜你喜欢
MySQL lock common knowledge points & summary of interview questions

Vs2013 how to make the program run on other computers

Basic use of Nacos configuration center

What are the software testing methods and technical knowledge points?

Kr-gcn: an interpretable recommendation system based on knowledge aware reasoning

26 years old, 0 basic career change software test, from 3K to 16K monthly salary, a super complete learning guide compiled by me

Processing of error b6267342 reported by AIX small machine in production environment

Can you be a coder if you don't learn English well? Stop worrying and learn it first

一键式文件共享软件Jirafeau

《天天数学》连载54:二月二十三日
随机推荐
[php8+oracle11g+windows environment without tools] Intranet / no network /win10/php connecting to Oracle database instance
联通入库|需要各地联通公司销售其产品的都需要先入总库
Simple understanding of why to rewrite hashcode and equals methods at the same time
Spark集群安装
IFLYTEK AI learning machine summer new product launch AI + education depth combination to create a new height of products
正如以往我们对于互联网的看法一样,我们对于互联网的认识开始变得深度
Dynamic planning learning (continuous update)
The child component of a single data flow modifies the value of the parent component
为什么在局域网(ERP服务器)共享文件夹上拷贝文件时导致全局域英特网断网
If you master these 28 charts, you will no longer be afraid to be asked about TCP knowledge during the interview
【无工具搭建PHP8+oracle11g+Windows环境】内网/无网络/Win10/PHP连接oracle数据库实例
华为7年经验的软件测试总监,给所有想转行学软件测试的同学的几个建议
Detailed description of gaussdb (DWS) complex and diverse resource load management methods
短视频平台搭建,淡入淡出 支持左滑右滑轮播图
static关键字续、继承、重写、多态
Realizing deep learning framework from zero -- LSTM from theory to practice [theory]
Low code, end-to-end, one hour to build IOT sample scenarios, and the sound network released lingfalcon Internet of things cloud platform
Huawei's software testing director with 7 years' experience, several suggestions for all students who want to switch to software testing
Summary of basic concepts of moosefs
Numpy array creation