当前位置：网站首页>MHA high availability coordination and failover

MHA high availability coordination and failover

2022-06-26 14:31:00 【[email protected]】

List of articles

1 MHA Concept
- 1.1 MHA The composition of
- 1.2 MHA Characteristics
2 build MySQL + MHA
3 fault simulation
- 3.2 Troubleshooting steps

1 MHA Concept

MHA（MasterHigh Availability） It's an excellent set of MySQL Software for failover and master-slave replication in high availability environment .
MHA The emergence of is to solve MySQL Single point problem .
MySQL During failover ,MHA Can do 0-30 Automatic failover within seconds .
MHA It can ensure the consistency of data to the greatest extent in the process of failover , To achieve real high availability .

1.1 MHA The composition of

MHA Node（ Data nodes ）
MHA Node Run on each MySQL Server .
MHA Manager（ The management node ）
MHA Manager It can be deployed separately on a separate machine , Manage multiple master-slave colony ; It can also be deployed in one slave Node .
MHA Manager It will detect the master node . When master Failure time , It can automatically send the latest data to slave Upgrade to a new master, And then put all the other slave Point back to the new master. The entire failover process is completely transparent to the application .

1.2 MHA Characteristics

During automatic failover ,MHA Trying to save binary logs from the down primary server , Ensure that data is not lost to the greatest extent
Use semi synchronous replication , Can greatly reduce the risk of data loss , If only one slave Has received the latest binary log ,MHA You can apply the latest binary logs to all other slave Server , Therefore, the data consistency of all nodes can be guaranteed
at present MHA Support one master multi-slave architecture , At least three servers , That is, one master and two slaves

2 build MySQL + MHA

Experimental thinking
1、MHA framework
Database installation
One master and two slaves
MHA build

2、 fault simulation
Simulate the failure of the main library
The alternate master database becomes the master database
The original fault main database is restored and rejoined to MHA Become a slave

Environmental preparation

host	operating system	IP Address	Installation package / Software / Tools
MHAmanager Node server	CentOS7	192.168.16.16	MHAnode Components 、MHAmanager Components
Master Node server	CentOS7	192.168.16.18	mysql-boost-5.7.20.tar.gz、MHAnode Components
Slave1 Node server	CentOS7	192.168.16.20	mysql-boost-5.7.20.tar.gz、MHAnode Components
Slave2 Node server	CentOS7	192.168.16.22	mysql-boost-5.7.20.tar.gz、MHAnode Components

2.1 All servers , Turn off the system firewall and security mechanism

systemctl stop firewalld
systemctl disable firewalld
setenforce 0
 The firewall and security mechanism of the four server systems are turned off

2.2 modify master（192.168.16.18）、Slave1（192.168.16.20）、Slave2（192.168.16.22） The hostname of the node

hostnamectl set-hostname mysql1
su -

hostnamectl set-hostname mysql2
su -

hostnamectl set-hostname mysql3
su -

 Modify three mysql The host name of the node server 
 And use su-
su Command default switch to root user 
su- The command will use the user's shell Environmental Science

2.3 Modify three Master、Slave1、Slave2 Master profile for node /etc/my.cnf

Master node (192.168.16.18)

vim /etc/my.cnf  # edit master Of mysql Master profile 
[mysqld]
server-id = 1  # modify server-id=1 Customize , Three stations mysql The server is different 
log_bin = master-bin  # Turn on binary log 
log-slave-updates = true  # Allow copying update binaries from the server 

systemctl restart mysqld  # Restart the service 
ln -s /usr/local/mysql/bin/mysql /usr/sbin/
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/  # take mysql Command and mysql Log file soft link to /usr/sbin, It's easy for the system to identify

Slave1、Slave2 node

vim /etc/my.cnf
server-id = 2 				# Three servers  server-id  It can't be the same 
log_bin = master-bin
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index

systemctl restart mysqld

# stay  Master、Slave1、Slave2  Two soft links are created on each node 
ln -s /usr/local/mysql/bin/mysql /usr/sbin/
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/

2.4 To configure mysql One master and two slaves

2.4.1 All database nodes are registered mysql to grant authorization

mysql -uroot -p
grant replication slave on *.* to 'myslave'@'192.168.16.%' identified by '123';		# Synchronize usage from database 
grant all privileges on *.* to 'mha'@'192.168.16.%' identified by 'manager';		#manager  Use 

grant all privileges on *.* to 'mha'@'Mysql1' identified by 'manager';				# Prevent the slave library from connecting to the master library through the host name 
grant all privileges on *.* to 'mha'@'Mysql2' identified by 'manager';
grant all privileges on *.* to 'mha'@'Mysql3' identified by 'manager';
flush privileges;

2.4.2 stay Master Node to view binaries and synchronization points

show master status;

 example ： Everyone's binary file name or offset may be different , Remember your 
+-------------------+----------+--------------+------------------+-------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 |     1215 |              |                  |                   |
+-------------------+----------+--------------+------------------+-------------------+

mysql -e "show master status;"  # View the master server information status

2.4.3 stay Slave1、Slave2 The node performs a synchronization operation

change master to master_host='192.168.16.18',master_user='myslave',master_password='123',master_log_file='master-bin.000001',master_log_pos=1215; 

start slave;

2.4.4 stay Slave1、Slave2 Node to view data synchronization results

show slave status\G" | awk '/Running:/{print}'  # Make sure  IO  and  SQL  Threads are  Yes, It means that the synchronization is normal .

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

2.4.5 Both slave libraries must be set to read-only mode

set global read_only=1;

2.4.6 Insert data to test database synchronization

## stay  Master  Insert a piece of data in the main library , Test for synchronization ##
create database test_db;
use test_db;
create table test(id int);
insert into test(id) values (1);

2.5 Master slave replication verification

stay Master Create a library

mysql-e "create database test_test;"
mysql-e "show databases;"

2.6 install MHA Software

2.6.1 All servers have MHA Dependent environment , First installation epel Source

yum install epel-release --nogpgcheck -y

yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

# install  MHA  software package , You must first install... On all servers  node  Components 
 For each operating system, the version is different , here  CentOS7.4  Must choose  0.57  edition .
 On all servers, you must first install  node  Components , Last in  MHA-manager  Install on node  manager  Components , because  manager  rely on  node  Components .

# Put the package mha4mysql-node-0.57.tar.gz Put in /opt Under the table of contents 
cd /opt
tar zxvf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57
perl Makefile.PL
make && make install

2.6.2 stay MHA manager Install on node manager Components

# Put the package mha4mysql-manager-0.57.tar.gz Put in /opt Under the table of contents 

cd /opt
tar zxvf mha4mysql-manager-0.57.tar.gz
cd mha4mysql-manager-0.57
perl Makefile.PL
make && make install

manager After the components are installed, install them in /usr/local/bin The following tools will be generated , It mainly includes the following ：

masterha_check_ssh	Check MHA Of SSH Configuration status
masterha_check_repl	Check MySQL Copy status
masterha_manger	start-up manager Script for
masterha_check_status	Detect current MHA Running state
masterha_master_monitor	testing master Is it down?
masterha_master_switch	Control failover （ Automatic or manual ）
masterha_conf_host	Add or remove configured server Information
masterha_stop	close manager

node Components will also be installed in /usr/local/bin Several scripts will be generated below （ These tools are usually made of MHAManager The script triggers , There is no need for human operation ） Mainly as follows ：

save_binary_logs	Save and copy master Binary log
apply_diff_relay_logs	Identify differentiated relay log events and apply their differentiated events to other slave
filter_mysqlbinlog	Remove unnecessary ROLLBACK event （MHA This tool is no longer used ）
purge_relay_logs	Clear relay logs （ It won't block SQL Threads ）

2.7 Configure password less authentication on all servers

（1） stay  manager  Configure password less authentication to all database nodes on the node 
ssh-keygen -t rsa 				# Press enter all the way 
ssh-copy-id 192.168.16.18
ssh-copy-id 192.168.16.20
ssh-copy-id 192.168.16.22

（2） stay  mysql1  Configuration to the database node  mysql2  and  mysql3  No password authentication 
ssh-keygen -t rsa
ssh-copy-id 192.168.16.20
ssh-copy-id 192.168.16.22

（3） stay  mysql2  Configuration to the database node  mysql1  and  mysql3  No password authentication 
ssh-keygen -t rsa
ssh-copy-id 192.168.16.18
ssh-copy-id 192.168.16.22

（4） stay  mysql3  Configuration to the database node  mysql1  and  mysql2  No password authentication 
ssh-keygen -t rsa
ssh-copy-id 192.168.16.18
ssh-copy-id 192.168.16.20

2.8 stay manager Configuration on node MHA

（1） stay  manager  Copy related scripts on node to /usr/local/bin  Catalog 
cp -rp /opt/mha4mysql-manager-0.57/samples/scripts /usr/local/bin
// After copying, there are four executable files 
ll /usr/local/bin/scripts/
----------------------------------------------------------------------------------------------------------
master_ip_failover  		# When switching automatically  VIP  Managed scripts 
master_ip_online_change 	# When switching online  vip  Management of 
power_manager 				# Script to shut down the host after the failure 
send_report 				# Because the script that sends the alarm after the failover 
----------------------------------------------------------------------------------------------------------

（2） When copying the above automatic switching  VIP  Manage scripts to  /usr/local/bin  Catalog , Use here master_ip_failover Scripts to manage  VIP  And failover 
cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin

（3） The modification is as follows ：（ Delete the original content , Directly copy and modify vip Related parameters . You can enter... Before copying  :set paste  solve vim Paste out of order problem ）
vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
############################# Add content section #########################################
my $vip = '192.168.16.200';									# Appoint vip The address of 
my $brdc = '192.168.16.255';								# Appoint vip The address of 
my $ifdev = 'ens33';										# Appoint vip Bound network card 
my $key = '1';												# Appoint vip Serial number of the bound virtual network card 
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";		# Represents that the value of this variable is ifconfig ens33:1 192.168.16.200
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";		# Represents that the value of this variable is ifconfig ens33:1 192.168.16.200 down
my $exit_code = 0;											# Specify the exit status code as 0
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {
    

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {
    

my $exit_code = 1;
eval {
    
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ([email protected]) {
    
warn "Got Error: [email protected]\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
    

my $exit_code = 10;
eval {
    
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ([email protected]) {
    
warn [email protected];
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
    
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
    
&usage();
exit 1;
}
}
sub start_vip() {
    
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
    
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}


（4） establish  MHA  Software directory and copy configuration files , Use here app1.cnf Configuration files to manage  mysql  Node server 
mkdir /etc/masterha
cp /opt/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha

vim /etc/masterha/app1.cnf						# Delete the original content , Copy and modify the node server directly IP Address 
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=123
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.16.20 -s 192.168.16.22
shutdown_script=""
ssh_user=root
user=mha

[server1]
hostname=192.168.16.18
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.16.20
port=3306

[server3]
hostname=192.168.16.22
port=3306

----------------------------------------------------------------------------------------------------------
[server default]
manager_log=/var/log/masterha/app1/manager.log　　　　　　#manager journal 
manager_workdir=/var/log/masterha/app1　　　　　　　　    #manager working directory 
master_binlog_dir=/usr/local/mysql/data/　　　　　　　　　#master preservation binlog The location of , The path here has to do with master Internally configured binlog The path is the same , In order to MHA Can find 
master_ip_failover_script=/usr/local/bin/master_ip_failover　　# Set auto failover When the switch script , That's the script above 
master_ip_online_change_script=/usr/local/bin/master_ip_online_change　　# Set the switch script for manual switching 
password=manager			# Set up mysql User's password , This password is the one that created the monitoring user in the previous article 
ping_interval=1				# Set up the main monitoring library , send out ping Time interval between packages , The default is 3 second , Try three times when there is no response failover
remote_workdir=/tmp			# Set the remote end mysql When a switch occurs binlog Where to save 
repl_password=123			# Set the password of the copy user 
repl_user=myslave			# Set the user of the copy user 
report_script=/usr/local/send_report　　　　　# Set the script of the alarm sent after switching 
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.16.20 -s 192.168.80.16.22	# Specify the slave server to check IP Address 
shutdown_script=""			# Set to close the failed host script after the failure occurs （ The main function of the script is to shut down the host and prevent brain crack , It's not used here ）
ssh_user=root				# Set up ssh Login user name of 
user=mha					# Set up mysql user 

[server1]
hostname=192.168.16.18
port=3306

[server2]
hostname=192.168.16.20
port=3306
candidate_master=1
# Set as candidate master, After setting this parameter , After the master-slave switch, the slave database will be promoted to the master database , Even if the slave library is not the latest in the cluster slave

check_repl_delay=0
# By default, if one slave backward master  exceed 100M Of relay logs Words ,MHA Will not choose the slave As a new master,  Because for this slave It takes a long time to recover ; By setting check_repl_delay=0,MHA Trigger switch in selecting a new master The replication delay will be ignored , This parameter is set for candidate_master=1 Is very useful , Because the candidate must be new in the process of switching master

[server3]
hostname=192.168.16.22
port=3306

2.9 The first configuration needs to be in Master Manually turn on the virtual server on the node IP

/sbin/ifconfig ens33:1 192.168.16.200/24

10． stay manager Testing on nodes ssh No password authentication , If it's normal, it will output successfully, As shown below .

masterha_check_ssh -conf=/etc/masterha/app1.cnf

2.11 stay manager Testing on nodes mysql Master-slave connection

Last appearance MySQL Replication Health is OK The words indicate normal .

masterha_check_repl -conf=/etc/masterha/app1.cnf

2.12 stay manager Start on the node MHA

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

----------------------------------------------------------------------------------------------------------
--remove_dead_master_conf： This parameter represents when a master-slave switch occurs , Old master library  ip  Will be removed from the configuration file .
--manger_log： Log storage location .
--ignore_last_failover： By default , If  MHA  Continuous downtime detected , And the interval between two downtime is not enough  8  In an hour , It won't go on  Failover,  The reason for this restriction is to avoid  ping-pong  effect . This parameter means to ignore the last  MHA  Trigger the file generated by switching , By default ,MHA  After the switch occurs, it will be in  app1.failover.complete  Log file records , The next time you switch again, if you find the file in the directory, you will not be allowed to trigger the switch ,  Unless you delete the file after the first switch , For convenience , I'm going to set it to --ignore_last_failover.
----------------------------------------------------------------------------------------------------------
● Use & Running programs in the background ： The result will be output to the terminal ; Use Ctrl+C send out SIGINT The signal , Procedural immunity ; close session send out SIGHUP The signal , Program closed .
● Use nohup Run the program ： The result is output to by default nohup.out; Use Ctrl+C send out SIGINT The signal , Program closed ; close session send out SIGHUP The signal , Procedural immunity .
● Use nohup and & Cooperate to start the program nohup ./test &： At the same time immunity SIGINT and SIGHUP The signal .

# see  MHA  state , You can see the current  master  yes  Mysql1  node 

masterha_check_status --conf=/etc/masterha/app1.cnf

# see  MHA  journal , Also to see the current  master  yes  192.168.16.18

cat /var/log/masterha/app1/manager.log | grep "current master"

#  see  Mysql1  Of  VIP  Address  192.168.16.200  Whether there is , This  VIP  The address is not because  manager  Nodes stop  MHA  Service and disappear .
ifconfig

// To shut down  manager  service , You can use the following command .
masterha_stop --conf=/etc/masterha/app1.cnf
 Or you can directly use  kill  process  ID  The way to turn off .

3 fault simulation

# stay  manager  Monitoring observation logging on the node 
tail -f /var/log/masterha/app1/manager.log

# stay  Master  node  Mysql1  Stop on mysql service 
systemctl stop mysqld
 or 
pkill -9 mysql

# After normal automatic switching once ,MHA  The process will exit .HMA  Will automatically modify  app1.cnf  The contents of the document , Will be down  mysql1  The node to delete . see  mysql2  Take over  VIP
ifconfig

 Algorithm of failover alternative master database ：
1． In general, the judgment from the database is from （position/GTID） Judge the pros and cons , The data are different , Closest to master Of slave, Become a candidate .
2． When the data is consistent , In the order of configuration files , Select an alternate master library .
3． Set weights （candidate_master=1）, Mandatory assignment of alternate masters by weight .
（1） By default, if one slave backward master 100M Of relay logs Words , Even with weight , It's going to fail .
（2） If check_repl_delay=0 Words , Even behind a lot of logs , It is also mandatory to choose it as the alternative host .

3.2 Troubleshooting steps

1． Repair mysql
systemctl restart mysqld

2． Fix the master-slave 
# In the current main database server  Mysql2  View binaries and synchronization points 
show master status;

# In the original master database server  mysql1  Perform synchronous operation 
change master to master_host='192.168.1620',master_user='myslave',master_password='123',master_log_file='master-bin.000001',master_log_pos=1745;

start slave;


3． stay  manager  Modify the configuration file on the node app1.cnf（ Add this record to it , Because it will automatically disappear when it fails to detect ）
vi /etc/masterha/app1.cnf
......
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.16.18 -s 192.168.16.22
......
[server1]
hostname=192.168.16.20
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.16.18
port=3306

[server3]
hostname=192.168.16.22
port=3306

4． stay  manager  Start on the node  MHA
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &



# To solve the problem of Chinese and English word incompatibility 
dos2unix /usr/local/bin/master_ip_failover