当前位置:网站首页>Etcd build a highly available etcd cluster
Etcd build a highly available etcd cluster
2022-07-05 16:53:00 【Zhang quandan, Foxconn quality inspector】
On the production line , although etcd It's simple to use , It only needs put get watch These commands can make the whole data flow , On the production line etcd There are many, many problems , Including first, how to ensure safety , Second, highly available etcd How to build a cluster , Third, how to back up data , These are closely related to the security of the whole cluster .
Etcd Important parameters of members

ETCD_NAME: The name of the node , The only one in the cluster
There are several types of parameters , The first is the core parameter , The most basic parameters , Related to members , Every etcd Members have their own names , The default is default, So build it Etcd In clusters , When no parameters are added , This Etcd Members are called default.
ETCD_NAME="etcd-1"
ETCD_NAME="etcd-2"
ETCD_NAME="etcd-3"ETCD_DATA_DIR: Data directory
etcd The final data falls on the disk ,etcd member member Name .etcd.
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"ETCD_LISTEN_PEER_URLS: Cluster communication listening address
ETCD_LISTEN_CLIENT_URLS: Client access listening address
Support two types url, One is peer url, because etcd It's clustering ,member and member The communication between them is going peer url Of , Client sent to etcd server This kind of request is to go client url Of , So these two different kinds of requests , Different types of data , It uses different ports to provide data .
For better protection peer Between , Maybe it has a higher priority , When doing network tuning in the future, you can aim at peer To ensure data at the network level .
So they are isolated ,client go client,peer go peer.
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.31.71:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.31.71:2379"
Etcd Important cluster parameters

ETCD_INITIAL_CLUSTER_STATE: Current status of joining the cluster ,new It's a new cluster ,existing Indicates joining an existing cluster
Create a new cluster , Or start one etcd The instance joins the existing cluster .
ETCD_INITIAL_CLUSTER_TOKEN: Initialize cluster Token
ETCD_INITIAL_ADVERTISE_PEER_URLS: Cluster notification address
ETCD_ADVERTISE_CLIENT_URLS: Client notification address
What is the announced address
Etcd Safety related parameters

since peer and client There are two ports , Every port needs security guarantee ,etcd The most commonly used way of security is mutal TLS, It's two-way TLS, One is that the client should access server When , To verify server End , One is server The client should verify your client .
Each corresponding port has relevant TLS Configuration parameters , such as cert What is it? ,key What is it? ,client crl What is it? .
To visit a TLS Of etcd, Then take it with you key server as well as ca To visit .
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/server.pem \
--peer-key-file=/opt/etcd/ssl/server-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem \
With the above parameters, you can set up the cluster .
Disaster preparedness

With these parameters, the cluster can be built , There are many clusters built member Of , these member On different nodes , A node is broken , There are other nodes that store data , So the data won't be lost .
Through this situation , Most of the data is secure , But there will be some extreme situations , If all member Disappeared together , Then the data is lost , So this is unbearable .
such as etcd In order to pod Way to run ,pod Does it have a data disk mount Come outside , This pod For some reasons, it writes too much data , Was expelled , After the expulsion, all these data were lost ,etcd be-all pod Deported , The data is lost .
Losing data can mean very serious problems ,etcd Save the kubernetes All the important information in the cluster .
such as calico cni plug-in unit , He can have his own etcd, It will save all of the current cluster IP Assigned information , If this information is lost , Suppose you run away in this cluster 10W individual Pod, This cluster 10W individual pod it IP What kind of distribution has disappeared , When you start a new pod When , This pod Of IP Assigned elsewhere, you don't know , It's probably this IP Be assigned repeatedly , Assign to another node or current node pod above .
The problems it causes are very serious , The whole cluster of IP It's a mess , Then there may be two pod Grab a IP The situation of , When users access a service , I was supposed to access this service , As a result, I jumped to another service , This is a very serious problem , No one can bear this result .
therefore etcd Data security is very important , In addition to ensuring data security with multiple copies , We also need to back up regularly .
The advantage of backup is that even if the instance drops , Then you can also restore from backup , Although the timeliness is not so strong , The lost data is from the point in time when the backup is generated to the point in time of the current cluster , If the backup frequency is higher , Then the less data is lost .
etcd Support by itself snapshot Command to create a snapshot of the cluster data , Support at the same time restore The command of will play back the information in the snapshot , Recover data .
How to build a kubernetes High availability cluster , In fact, we still need to see etcd How to do high availability ,apiserver How to do high availability ,control manager and scheduel How does the scheduler do .
Above is etcd High availability management .
Etcd Capacity management 、 Debris removal
Capacity management

Etcd It gives some advice , A single object does not exceed 1.5M, When your data is big , Its synchronization overhead and memory snapshot overhead are very large , Will make the whole etcd The performance of , Therefore, it is suggested that the object should not be too large .
etcd The default capacity is 2G, It does not recommend more than 8G, Generally, the production system will be set to 8G.
Clean up disk fragments defrag

Above is the setting etcd The storage size is 16M, Then keep writing data into it , An error is reported when the capacity is exceeded as follows , The cluster was exploded by me , After exceeding the quota, there is no way to write data

Now in alarm state , Had an accident , There's no room , about alarm State cluster , Any write operation cannot succeed .
![]()

Can pass defrag Clean up the hard disk , At this time, you can actually clear some hard disk space , however alarm Of no space It's still there , Writing at this time will still report errors .

So we should first remove alarm, Then write again to succeed . If you do etcd This is bound to happen in the operation and maintenance of , Because the writing of data leads to the explosion of disk , Lead to db It burst , This time will come alarm state , After data cleaning, if you want to continue writing data , If you want to go, disable it first alarm, In this way, data can be written .
that defrag Is to clean up disk fragments , Disk fragment cleaning requires other operations , Such as the compact command ,etcd It is a multi version management system , stay bolt db Inside , All of it key All are version Information , So it will have a lot of information about historical versions , But many times, the historical version information may not be used , At this time, we want it to do some compression , Then it supports compact command , Let you specify reversion Version of , Then all previous versions will be cleared , This saves space .
secondly defrag Do disk defragmentation , Many fragments lead to low disk utilization , that defrag Just a moment ok 了 .
etcd In fact, later versions automatically support compact, You need to manually , In the old version etcd It needs operation and maintenance very much , It often happens that the hard disk is burst again , Then we need to go online to find out the reason , To go from compcat, Back etcd Support to do it automatically compcat Such an operation .
边栏推荐
- Jarvis OJ shell traffic analysis
- The memory of a Zhang
- Win11提示无法安全下载软件怎么办?Win11无法安全下载软件
- 如何安装mysql
- Summary of methods for finding intersection of ordered linked list sets
- Google Earth Engine(GEE)——Kernel核函数简单介绍以及灰度共生矩阵
- SQL injection of cisp-pte (Application of secondary injection)
- 调查显示传统数据安全工具面对勒索软件攻击的失败率高达 60%
- 阈值同态加密在隐私计算中的应用:解读
- 浏览器渲染原理以及重排与重绘
猜你喜欢

How to install MySQL

Get ready for the pre-season card game MotoGP ignition champions!

Global Data Center released DC brain system, enabling intelligent operation and management through science and technology

Benji Bananas 会员通行证持有人第二季奖励活动更新一览

Jarvis OJ webshell analysis

调查显示传统数据安全工具面对勒索软件攻击的失败率高达 60%

2020-2022两周年创作纪念日

Learnopongl notes (II) - Lighting

Jarvis OJ shell traffic analysis

美国芯片傲不起来了,中国芯片成功在新兴领域夺得第一名
随机推荐
Enter a command with the keyboard
How does win11 change icons for applications? Win11 method of changing icons for applications
中国广电正式推出5G服务,中国移动赶紧推出免费服务挽留用户
Timestamp strtotime the day before or after the date
Accès aux données - intégration du cadre d'entité
【深度学习】深度学习如何影响运筹学?
How to install MySQL
C# TCP如何限制单个客户端的访问流量
极坐标扇图使用场景与功能详解
养不起真猫,就用代码吸猫 -Unity 粒子实现画猫咪
SQL injection of cisp-pte (Application of secondary injection)
Scratch colorful candied haws Electronic Society graphical programming scratch grade examination level 3 true questions and answers analysis June 2022
Seaborn draws 11 histograms
數據訪問 - EntityFramework集成
Win11如何给应用换图标?Win11给应用换图标的方法
Jarvis OJ webshell analysis
[729. My Schedule i]
Jarvis OJ Telnet Protocol
Games101 notes (II)
Do sqlserver have any requirements for database performance when doing CDC