当前位置:网站首页>Etcd build a highly available etcd cluster
Etcd build a highly available etcd cluster
2022-07-05 16:53:00 【Zhang quandan, Foxconn quality inspector】
On the production line , although etcd It's simple to use , It only needs put get watch These commands can make the whole data flow , On the production line etcd There are many, many problems , Including first, how to ensure safety , Second, highly available etcd How to build a cluster , Third, how to back up data , These are closely related to the security of the whole cluster .
Etcd Important parameters of members
ETCD_NAME: The name of the node , The only one in the cluster
There are several types of parameters , The first is the core parameter , The most basic parameters , Related to members , Every etcd Members have their own names , The default is default, So build it Etcd In clusters , When no parameters are added , This Etcd Members are called default.
ETCD_NAME="etcd-1"
ETCD_NAME="etcd-2"
ETCD_NAME="etcd-3"
ETCD_DATA_DIR: Data directory
etcd The final data falls on the disk ,etcd member member Name .etcd.
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS: Cluster communication listening address
ETCD_LISTEN_CLIENT_URLS: Client access listening address
Support two types url, One is peer url, because etcd It's clustering ,member and member The communication between them is going peer url Of , Client sent to etcd server This kind of request is to go client url Of , So these two different kinds of requests , Different types of data , It uses different ports to provide data .
For better protection peer Between , Maybe it has a higher priority , When doing network tuning in the future, you can aim at peer To ensure data at the network level .
So they are isolated ,client go client,peer go peer.
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.31.71:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.31.71:2379"
Etcd Important cluster parameters
ETCD_INITIAL_CLUSTER_STATE: Current status of joining the cluster ,new It's a new cluster ,existing Indicates joining an existing cluster
Create a new cluster , Or start one etcd The instance joins the existing cluster .
ETCD_INITIAL_CLUSTER_TOKEN: Initialize cluster Token
ETCD_INITIAL_ADVERTISE_PEER_URLS: Cluster notification address
ETCD_ADVERTISE_CLIENT_URLS: Client notification address
What is the announced address
Etcd Safety related parameters
since peer and client There are two ports , Every port needs security guarantee ,etcd The most commonly used way of security is mutal TLS, It's two-way TLS, One is that the client should access server When , To verify server End , One is server The client should verify your client .
Each corresponding port has relevant TLS Configuration parameters , such as cert What is it? ,key What is it? ,client crl What is it? .
To visit a TLS Of etcd, Then take it with you key server as well as ca To visit .
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/server.pem \
--peer-key-file=/opt/etcd/ssl/server-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem \
With the above parameters, you can set up the cluster .
Disaster preparedness
With these parameters, the cluster can be built , There are many clusters built member Of , these member On different nodes , A node is broken , There are other nodes that store data , So the data won't be lost .
Through this situation , Most of the data is secure , But there will be some extreme situations , If all member Disappeared together , Then the data is lost , So this is unbearable .
such as etcd In order to pod Way to run ,pod Does it have a data disk mount Come outside , This pod For some reasons, it writes too much data , Was expelled , After the expulsion, all these data were lost ,etcd be-all pod Deported , The data is lost .
Losing data can mean very serious problems ,etcd Save the kubernetes All the important information in the cluster .
such as calico cni plug-in unit , He can have his own etcd, It will save all of the current cluster IP Assigned information , If this information is lost , Suppose you run away in this cluster 10W individual Pod, This cluster 10W individual pod it IP What kind of distribution has disappeared , When you start a new pod When , This pod Of IP Assigned elsewhere, you don't know , It's probably this IP Be assigned repeatedly , Assign to another node or current node pod above .
The problems it causes are very serious , The whole cluster of IP It's a mess , Then there may be two pod Grab a IP The situation of , When users access a service , I was supposed to access this service , As a result, I jumped to another service , This is a very serious problem , No one can bear this result .
therefore etcd Data security is very important , In addition to ensuring data security with multiple copies , We also need to back up regularly .
The advantage of backup is that even if the instance drops , Then you can also restore from backup , Although the timeliness is not so strong , The lost data is from the point in time when the backup is generated to the point in time of the current cluster , If the backup frequency is higher , Then the less data is lost .
etcd Support by itself snapshot Command to create a snapshot of the cluster data , Support at the same time restore The command of will play back the information in the snapshot , Recover data .
How to build a kubernetes High availability cluster , In fact, we still need to see etcd How to do high availability ,apiserver How to do high availability ,control manager and scheduel How does the scheduler do .
Above is etcd High availability management .
Etcd Capacity management 、 Debris removal
Capacity management
Etcd It gives some advice , A single object does not exceed 1.5M, When your data is big , Its synchronization overhead and memory snapshot overhead are very large , Will make the whole etcd The performance of , Therefore, it is suggested that the object should not be too large .
etcd The default capacity is 2G, It does not recommend more than 8G, Generally, the production system will be set to 8G.
Clean up disk fragments defrag
Above is the setting etcd The storage size is 16M, Then keep writing data into it , An error is reported when the capacity is exceeded as follows , The cluster was exploded by me , After exceeding the quota, there is no way to write data
Now in alarm state , Had an accident , There's no room , about alarm State cluster , Any write operation cannot succeed .
Can pass defrag Clean up the hard disk , At this time, you can actually clear some hard disk space , however alarm Of no space It's still there , Writing at this time will still report errors .
So we should first remove alarm, Then write again to succeed . If you do etcd This is bound to happen in the operation and maintenance of , Because the writing of data leads to the explosion of disk , Lead to db It burst , This time will come alarm state , After data cleaning, if you want to continue writing data , If you want to go, disable it first alarm, In this way, data can be written .
that defrag Is to clean up disk fragments , Disk fragment cleaning requires other operations , Such as the compact command ,etcd It is a multi version management system , stay bolt db Inside , All of it key All are version Information , So it will have a lot of information about historical versions , But many times, the historical version information may not be used , At this time, we want it to do some compression , Then it supports compact command , Let you specify reversion Version of , Then all previous versions will be cleared , This saves space .
secondly defrag Do disk defragmentation , Many fragments lead to low disk utilization , that defrag Just a moment ok 了 .
etcd In fact, later versions automatically support compact, You need to manually , In the old version etcd It needs operation and maintenance very much , It often happens that the hard disk is burst again , Then we need to go online to find out the reason , To go from compcat, Back etcd Support to do it automatically compcat Such an operation .
边栏推荐
- Jarvis OJ Webshell分析
- It is forbidden to copy content JS code on the website page
- Enter a command with the keyboard
- Dare not buy thinking
- 详解SQL中Groupings Sets 语句的功能和底层实现逻辑
- 如何安装mysql
- Domestic API management artifact used by the company
- DeSci:去中心化科学是Web3.0的新趋势?
- Summary of PHP pseudo protocol of cisp-pte
- PHP 严格模式
猜你喜欢
随机推荐
Flet tutorial 12 stack overlapping to build a basic introduction to graphic and text mixing (tutorial includes source code)
深潜Kotlin协程(二十一):Flow 生命周期函数
BS-XX-042 基于SSM实现人事管理系统
【微信小程序】一文读懂小程序的生命周期和路由跳转
PHP 严格模式
Learnopongl notes (II) - Lighting
Data access - entityframework integration
How to uninstall MySQL cleanly
树莓派4b安装Pytorch1.11
【刷題篇】鹅廠文化衫問題
"21 days proficient in typescript-3" - install and build a typescript development environment md
[team PK competition] the task of this week has been opened | question answering challenge to consolidate the knowledge of commodity details
yarn 常用命令
Facing new challenges and becoming a better self -- attacking technology er
【机器人坐标系第一讲】
解决CMakeList find_package找不到Qt5,找不到ECM
Fleet tutorial 09 basic introduction to navigationrail (tutorial includes source code)
面对新的挑战,成为更好的自己--进击的技术er
Games101 notes (III)
帮忙看看是什么问题可以吗?[ERROR] Could not execute SQL stateme