当前位置:网站首页>A keepalived high availability accident made me learn it again

A keepalived high availability accident made me learn it again

2022-07-04 14:32:00 InfoQ

Last time we met a  MySQL  Malfunctioning accident , This time I encountered another wonderful problem :

Keepalived  Virtual of highly available components  IP  Continuous drift , Lead to  MySQL  Master and slave switch constantly , Leading to  MySQL  Master slave data synchronization failed .

Although it can not be reproduced  Keepalived  The question of , But I studied it deeply  Keepalived  A lot of experiments have been done for the core configuration parameters . Wukong takes everyone to have a look  Keepalived  How it works , And why it is highly available .

The principle explanation is divided into two parts 、 in 、 Next three :

The first part involves the following knowledge points
  • Keepalived  How to provide data traffic forwarding .
  • Keepalived  The principle of election .
  • Keepalived  Load balancing algorithm .

The second part involves the following knowledge points
  • Keepalived  Routing rules .
  • Keepalived  How to monitor the service .
  • Keepalived  How to failover .
  • Keepalived  Architecture analysis of .

The next part designs the following knowledge points
  • Keepalived  Configuration details
  • Keepalived  Actual deployment

One 、Keepalived  and  LVS  summary

1.1 Keepalived  summary
talk about  Keepalived, The impression is that it is used in high availability architecture , Ensure that a service does not fail , In fact, it has many other functions .Keepalived  yes  Linux  A relatively lightweight and highly available solution under the system , This lightweight is relative to  Heartbeat  Components such as . although  Heartbeat  Functional perfection 、 Strong professionalism , But the installation and deployment does not  Keepalived  Simple ,Keepalived  Just one configuration file is required . Most enterprises choose  Keepalived  As a highly available component .
1.2 LVS  summary
Keepalived  From the beginning  Alexandre Cassen  Use  C  Open source software projects written in , The main purpose of the project is to simplify  LVS  Project configuration and enhancement  LVS  The stability of . Simply speaking ,Keepalived  That's right  LVS  The extension enhancement of .
LVS(Linux Virtual Server) Which translates as  Linux  Virtual server , Open source load project led by Dr. zhangwensong , at present  LVS  Has been integrated into  Linux  Kernel module .
LVS  It is mainly used in load balancing , such as  Web  The client wants to access back-end services ,Web  The request will go through first  LVS  Scheduler , The scheduler determines how to distribute to all servers on the back end according to the preset algorithm .
1.3 LVS  The basic principle
LVS  The basic principle of is shown in the figure below :
null
LVS The basic principle
LVS  The core function of is to provide load balancing , There are many kinds of load balancing technologies :
  • be based on  DNS  Domain name rotation resolution scheme .
  • Scheduling access scheme based on client .
  • Scheduling scheme based on application layer system .
  • be based on  IP  Address scheduling scheme .
And the most efficient is based on  IP  Address scheduling scheme
. In fact, it is to forward the request to the corresponding  IP  Address  +  Port number , It's very efficient ,LVS  Of  IP  Load balancing technology is achieved by  IPVS  Module to achieve ,IPVS  yes  LVS  The core software of cluster system .
LVS  The load balancer virtualizes a IP(VIP), For the client , It only knows this in advance  VIP  Of , The client sends the request to  VIP, then  LVS  The load balancer forwards the request to one of the back-end servers , These servers are called  Real Server( Real server ). The forwarding rules are set by  LVS  Load balancing algorithm , Such as random assignment 、 According to the weight distribution, etc .
The functional requirements provided by the back-end server are consistent , No matter to which server , The final result is the same , So for the client , It doesn't care how many back-end servers are serving , It only cares about the visited  VIP  How much is the .
After the back-end service processes the request , How to return data to the client ? according to  LVS  Different patterns of , Will choose different ways to return data to the client .LVS  There are three modes of working for :NAT  Pattern 、TUN  Pattern 、DR  Pattern . The routing mechanism will be discussed later .

Two 、Keepalived  Traffic forwarding principle

Keepalived  by  Linux  The system provides load balancing and high availability . The ability of load balancing comes from  Linux  Kernel  LVS  Project modules  IPVS(IP Virtual Server).

Keepalived  Running on the  Linux  In the system , It will start the in the kernel  LVS  Service to create a virtual server . For example, we started one on both servers  Keepalived  service , then  LVS  It will virtualize one  IP(VIP), But there is only one  Keepalived  Will take over this  VIP, That is to say, the client's request will only arrive at  Master Keepalived  Node . In this way, the traffic will only reach one keepalived  Yes , then  keepalived  Several real services can be configured  IP  Address and port , The traffic is allocated to these services through the load scheduling algorithm . For the other  Backup Keepalived  node , It is in standby mode , There is no traffic access .
null

3、 ... and 、Keepalived  How to select a master

So the two above  Keepalived  How the service chooses one of them as  Master  Node ?
We usually run on two active and standby servers or one active and multiple standby servers . And these servers all follow  VRRP  Of .
3.1 VRRP  agreement
VRRP  The full name is  Virtual Router Redundancy Protoco, Virtual routing redundancy protocol . It is a fault-tolerant protocol , In order to solve the problem of single point routing failure in LAN . For example, we used to use a router for routing and forwarding , If this router fails , Then the entire route forwarding link is broken , The service is not available .

VRRP  The main functions of the protocol :
  • Virtual router and virtual  IP.
  • Master  radio broadcast  ARP  message .
  • Backup  Elect a new  Master.
null
Now let's configure multiple routers ( One master and many backup ), Each router has its own  IP  Address , They form a group of routers , One of them acts as  Master, Other things  Backup. These routers then virtualize a single route , Own your own  IP  Address , That is to say  Virtual IP, abbreviation  VIP.

The client accesses this virtual  IP  The address will do , When the main router fails , The backup router selects a new master router through an election mechanism , Continue to provide routing services to clients , High availability of routing function is realized .

Router on  VRRP  After function , Select according to the priority configuration , The higher priority will become the master (Master) Router , The others will become standby (Backup) Router .

Master  The router sends  VRRP  Notification message to  Backup  Router , Tell them I'm working normally , You don't have to run for a new one  Master  Router .

About  Master  and  Backup  The principle of communication is actually very simple , It's just one.
heartbeat
, But this and  Eureka  The mechanism of heart beat is different ,Eureka  It is the client that regularly reports to  Eureka  The registry sends a heartbeat , and  Keepalived  It is  Master  Regularly send to  Backup  Send heartbeat mechanism , and  Backup  The router has a task of regularly monitoring notifications , If no notification is received within this time period , Think  Mater  It's broken down , And then vote by priority , Elect a new  Master  after , On a regular basis  VRRP  Notification message to  Backup  Router .(Eureka  heartbeat : Emperor Taizong of the Tang Dynasty made the micro service “ heartbeat ” Play to the extreme !)

Through this  VRRP  agreement , It can improve the availability of the system , Avoid service unavailability caused by single point of failure , At the same time, when the router fails , There is no need to manually modify the network connection information to access the new  Master  Router . As shown in the figure below ,Backup  Switch to  Master.
null
The configuration of the election mainly depends on  vrrp_instance  and  vrrp_script  Field .
3.2 vrrp_instance  To configure
about  Keepalived  There are three important parameters :
  • state: Optional value is  MASTER、BACKUP.
  • priority: Priority of nodes , Optional value is  [1-255].
  • nopreempt: Non-preemptive mode , If configured , When the priority is high , Will set themselves to  Master.
vrrp_instance VI_1 {
 #  The node is  BACKUP
 state BACKUP
 #  The priority for  100
 priority 100 
 #  Non-preemptive mode
 nopreempt
}
When one is set to  master, The other is set to  BACKUP, When  MASTER  After the failure ,BACKUP  It will be new  MASTER, And when the old  MASTER  After recovery , It will seize and become a new  MASTER, To take over  VIP  Of traffic , Cause unnecessary active / standby switchover . To avoid this kind of active / standby switchover , We can put two  Keepalived  Set to  BACKUP, And the one with high priority  Keepalived  Set to no preemption  nopreempt.
3.2 vrrp_script  To configure
And priority  priority  It can be increased or decreased , adopt  vrrp_script  To configure the :
vrrp_script restart_mysql {
 #  Monitor and restart  mysql  Containers , If  MySQL  Normal service or  MySQL  Failure
 script "/usr/local/keepalived/restart_mysql.sh" 
 interval 5
 weight -20
}
This is the configuration of the scheduled execution script ,script  Configuration will monitor  mysql  Whether the service is abnormal . This is a custom script , You can write your own return value . The logic I write here is if  MySQL  If the service is normal, it returns  0, If not, return to  1.
When  weight  Is a positive number
null
When the script returns  0  when ( Normal service ), Then increase the priority =priority + weight; otherwise , Keep the setting  priority  value .
Switch strategy :
  • If  MASTER  Node  vrrp_script  When script detection fails , If  MASTER  Node  priority  Less than  BACKUP  node  weight + priority, Active / standby switchover occurs .
  • If  MASTER  Node  vrrp_script  When the script detection is successful , If  MASTER  Node  priority  Greater than  BACKUP  node  weight + priority, No active / standby switchover occurs .
When  weight  It's a negative number
null
When the script returns non  0  when ( Service exception ), The priority =priority - |weight|; otherwise , Keep the setting  priority  value .

Switch strategy :
  • If  MASTER  Node  vrrp_script  When script detection fails , If  MASTER  Node  priority - |weight|  Less than  BACKUP  node  priority  value , Active / standby switchover occurs .
  • If  MASTER  Node  vrrp_script  When the script detection is successful , If  MASTER  Node  priority  Greater than  BACKUP  node  priority  value , No active / standby switchover occurs .

Be careful : The range of increasing or decreasing priorities is  [1,254].

Illustrate with examples :

Two sets of  Keepalived  Of  state  It's all configured to  BACKUP, One of the servers  node1  Of  Keepalived  The priority of is set to  100, Non-preemptive mode , Another one  node2  The priority of is set to  90, Preemption mode .

node1  The priority of node configuration is high , It becomes  Master  node , When  Master  Node monitoring  MySQL  After a service failure , Will lower the priority , from  100  Down to  80. The other one has priority of  90, Receive a lower priority than yourself  ARP  On the radio , Will become a new  Master  node . and  node1  The node will become  BACKUP  node , When  node1  Monitoring to  MySQL  After the service is restored , The priority becomes configured  priority 100, But it will not preempt .

As shown in the figure below : although  node1  Upper  keepalived  restart  mysql  succeed , Priority is restored to  100, But it didn't change into  master, Or maintain  backup  state .
null
and  node2  still  master  node , Timing to  node 1  send out  vrrp  notice , As shown in the figure below :
null
If  node2  Of  mysql  It's down. , Then its priority will change from  90  Down to  70, Even so , There will be no active / standby switchover , Because our configured strategy is  node1  Won't take over . If you want to switch to  node1, You can only put  node2  Of  keepalived  Take the initiative to stop , The second part of failover will cover .

Four 、Keepalived  Load balancing mechanism based on

4.1  Forwarding mechanism
To understand  Keepalived  Load balancing mechanism based on , Must understand  IPVS, That is to say  IP Virtual Server,IP  Virtual server .

IPVS  The module is  Keepalived  A third-party module introduced , The purpose is to solve the single problem  IP  Multi server working environment , adopt  IPVS  Can be implemented based on  IP  Load balancing cluster .IPVS  Default included in  LVS  In software , and  LVS  It is also included in  Linux  In the system . therefore  Keepalived  stay  Linux  The system can directly use  LVS  The function of .LVS  The function of is to create a virtual  IP, That is to say  VIP, Client requests arrive first  VIP, Then select a server node from the server cluster , Forward traffic to this node , This node handles the request .
null
As shown in the figure :
  • Keepalived  It runs in user space  LVS  route (LVS Router) process , As  MASTER  role  Keepalived  be called  Active Router,BACKUP  Character's  Keepalived  be called  SLAVE Router. Only  Active Router  It's working , other  Router  yes  Stand By ( Standby mode ).
  • Active Router  and  Backup Router  Between is through  VRRP  The protocol performs active / standby switchover .
  • Active Router  Will start in the kernel  LVS  Service to create a virtual server , The virtual server has a virtual  IP(VIP), Like in the picture below  VIP  by  192.168.56.88.
  • Active Router  And set up  IPVS TABLES( Server list ), The address and service running status of the back-end server are recorded . Load balancing selects an available service from the server list for forwarding .
  • These back-end services are configured in  Keepalived  Of  virtual_server  In configuration item , As shown below , There are three  real_server, Corresponding to three back-end servers .
virtual_server 192.168.56.88 80 { 
 delay_loop 6 
 lb_algo rr 
 lb kind NAT 
 protocol tcp
 #  The server  1
 real_server 192.168.56.11 80 { 
 TCP_CHECK { 
 connect timeout 10 
 }
 #  The server  2
 real_server 192.168.56.12 80 { 
 TCP_CHECK { 
 connect timeout 10 
 }
 #  The server  3
 real_server 192.168.56.13 80 { 
 TCP_CHECK { 
 connect timeout 10 
 }
4.2  Load scheduling algorithm
There is a field in the configuration  lb_algo, This is the load scheduling algorithm , It can be configured as  rr、wrr、lc、wlc、sh、dh  etc. . What is commonly used is  rr  and  wrr.
  • rr, Namely  Round-Robin, Polling algorithm ,  Every server is equal , Scheduled in turn .
  • wrr, Namely  Weighted Round-Robin, Weighted polling scheduling algorithm , Larger weighted value , More requests will be forwarded . For example, some servers have weak hardware capabilities , Then you can set the weight value lower .
  • lc, Namely  Least-Connection, Least connection algorithm . Requests are forwarded to servers with fewer active connections . The number of connections is through  IPVS Table  To dynamically track .
  • wlc, Weighted least connected . According to the weight  +  The number of connections   Allocation request .
  • sh, Target address hash algorithm , Through static  Hash  Query purpose in the table  IP  Address to determine the server to forward the request , This kind of algorithm is mainly used in caching proxy server .
  • dh, Source address hash algorithm , Through static  Hash  Query source in the table  IP  Address to determine the server to forward the request , This kind of algorithm is mainly used in firewall  LVS Router  in .

5、 ... and 、 summary

Keepalived  As highly available 、 High performance components , It is often used in a cluster environment , So understand  Keepalived  The underlying principle of , You can also learn many common principles of high availability and load balancing .

This article introduces  Keepalived  Of  IPVS  function , Started a virtual server , Virtualized a  VIP, Used to receive requests from clients , Then the traffic is forwarded to the real server through the load scheduling algorithm .

Keepalived  It is generally used in scenarios where one active standby or one active standby is used , The main election is through configuration  state、privority、nopreemt、weight  Field to achieve .

In the next article, let's take a look at the real server after processing the request , How to return data to the client , This involves  LVS  Routing rules . And monitoring and failover  Keepalived  Core functions , It is very necessary to explore deeply .

Link to the original text :https://mp.weixin.qq.com/s/LgqSqxBiK25wmwrsmPa83w
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207041232515056.html

随机推荐