Keep creating , Accelerate growth ！ This is my participation 「 Nuggets day new plan · 6 Yuegengwen challenge 」 Of the 9 God , Click to see the event details

once MySQL Accidents caused by misoperation ,「 High availability 」 I can't stand it anymore ！

This is Wukong's first 152 Original articles

Official website ：www.passjava.cn

Hello , I'm Wukong .

Last time our project did not put MySQL Is the high availability deployment ready ,MySQL Dual master mode + Keepalived, To ensure high availability . Simply put, there are two MySQL Master node , There are two Keepalived Installed on the host computer to monitor MySQL The state of , Once problems are found , Just restart MySQL, And the client will automatically connect to another computer MySQL.

For details, please see this article written by Wukong ： actual combat MySQL High availability Architecture

This is an accident we encountered in the project , Let's go over it .

The contents of this article are as follows ：

The scene of the accident

Environmental Science ： Test environment
Time ： In the morning 10:30
Feedback personnel ： Test group , There's a frying pan , After preliminary investigation by R & D colleagues , It is found that there may be a database problem .

Then start looking for reasons . Because this cluster environment is deployed by me , So if I came to check, I was familiar with it .

System deployment diagram

First, let's talk about the deployment diagram of the system , So that you can understand .

Two databases are deployed in node55 and node56 Node , They are in a master-slave relationship with each other , So it is called double master .

There are two Keepalived Deployed in node55 and node56 above , Separately monitor MySQL Container state .

Reasons for error reporting and solutions

① My first thought was , Not having Keepalived To ensure high availability , Even if MySQL Hang up , It can also be done through Keepalived To restart automatically . Even if one fails to restart , There is another one that can be used ？
② Then go to the server and have a look MySQL The state of the container . To MySQL On two servers of , Let's take a look at MySQL Container state ,docker ps command , Found two MySQL Containers are not in the list , This means that the container is not functioning properly .

③ It's impossible , I installed Keepalived High availability components , Don't Keepalived I've also hung up ？
④ Check the wave quickly Keepalived, Found two Keepalived It works . View by executing the command ：systemctl status keepalived

⑤ what ,Keepalived It's normal , Keepalived It will restart every few seconds MySQL, Maybe I didn't see it in that short free time MySQL Container start up ？ Execute another command ,docker ps -a, List the status of all containers . You can see MySQL Started and exited , explain MySQL It's really rebooting .

⑥ That means Keepalived Although it was restarted MySQL Containers , however MySQL I have a problem with myself , that Keepalived There is no way to improve the high availability of .
⑦ Then how to fix ？ Just look at MySQL Report something wrong . Execute the command to view the container log .docker logs < Containers id>. Find the recent log ：

⑧ Tips mysql-bin.index file does not exist , This file is configured for master-slave synchronization , stay my.cnf In the configuration .

After this configuration , Then, when performing master-slave synchronization , Will be in var/lib/mysql/log Multiple... Are generated under the directory mysql-bin.xxx The file of . One more mysql-bin.index Index file , It will mark now binlog Where are the log files recorded .

mysql-bin.index The contents of the document are as follows ：

/var/lib/mysql/log/mysql-bin.000001

This mysql-bin.000001 The document is still numbered , There's still a pit here , I'll talk about it later .

⑨ The error message indicates that there is a lack of mysql-bin.index, Let's check it out , Not really ！ No matter how the file disappeared , Get this log Create the folder first , then mysql It will automatically generate this file for us .

Solution ： Execute the following command to create a folder and add permissions .

mkdir logchmod 777 log -R

⑩ This is available on both servers log After the directory ,Keepalived Also help us restart automatically MySQL Containers , Then visit one of the nodes node56 Of MySQL The state of , Why , It's the wrong report .

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

You can see several key messages ：

Slave_IO_Running: NO, Currently synchronized I/O The thread is not running , This I/O Threads are from the library , It will request the master database binlog, And will get binlog Write local relay-log ( relay logs ) In file . Is not running , It means that the slave database synchronization is not running normally .
Master_Log_File: mysql-bin.000014, This indicates that the currently synchronized log file is 000014, We saw the node before node56 On mysql.index It says 000001, This 000014 Not at all index In the document , So it will report an error .

This involves the principle of master-slave synchronization , Previous picture ：

Two threads are generated from the library , One I/O Threads , One SQL Threads ;

I/O The thread will request the main library binlog Log files , And will get binlog Log files Write local relay-log ( relay logs ) In file ;

The main library will generate a dump Threads , Used to give to the slave I/O Thread transfer binlog;

SQL Threads , Will read relay log Log in file , And resolved into SQL Statements are executed one by one .

That right , We re specify which log file to synchronize , And the location of synchronization .

Solution ：

Look at the main library node55 Log file status on .

Write down these two messages ：File=mysql-bin.00001,Position=117748.（ There's also a hole here ： First lock the watch , Look at these two values , After starting synchronization from the library , Unlock the table ）.

The specific orders are as follows ：

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS
UNLOCK TABLES

Then from the library node56 Reassign the synchronized log file and location on the ：

#  Stop synchronizing from the Library STOP SLAVE;#  Set sync file and location CHANGE MASTER TO MASTER_HOST='10.2.1.55',MASTER_PORT=3306,MASTER_USER='vagrant',MASTER_PASSWORD='vagrant',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=117748;#  Turn on synchronization START SLAVE;

Check again and no error will be reported ,I/O The thread also runs ,

Insert picture description here

And then node55 As a slave Library ,node56 As the master library , Also perform the above steps , The status display is normal , And then use navicat Connect the tool to the database , It's all normal , Under the feedback of the test group , Fix up the work .

I seem to have forgotten a question , Why? log The folder was wiped out ？？

Why there's a problem ？

Then I asked if anyone had deleted this at that time /var/lib/mysql/log Catalog , No one will delete this directory casually .

But found log The parent directory of /var/lib/mysql There are many other folders , such as xxcloud, xxcenter etc. . This is the name of several databases in our project , Just in the folder of this directory , Will be shown in navicat On , It's one-to-one , As shown in the figure below . It also shows log database .

Is there anyone from navicat I got rid of it log database ？ Very likely ！

Sure enough , A colleague was in the process of migration and upgrading , Found this log There is no database in the old system , So I cleaned it up , This is equivalent to log The database is down , At the same time, I will also put log The folder is gone . Okay , Finally, it's all over the place ！ In fact, I didn't consider this in the early stage log A problem with the catalog . you 're right , This is my pot ~

improvement

Actually, when you synchronize the database , This should not be used to override synchronization , Single database synchronization can be adopted , Will not kill log Database . however , This log It's a little strange to put the database here , Can you not show up here ？

We just specify this log The directory is not in /var/lib/mysql It's just in the catalog .

Dongge suggested ：log Files and databases data File isolation ：

datadir = /var/lib/mysql/data

log_bin = /var/lib/mysql/log

Another question , Is our high availability really high ？

At least they didn't call the police in time ,MySQL Database hung , I don't know , They are all feedback from students through tests .

Can you feel it in time MySQL Unusual ？

It can be used here Keepalived The function of sending e-mail , Or through the log alarm system . This is what needs to be improved in the future .

END -

当前位置：网站首页>An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!

An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!

once MySQL Accidents caused by misoperation ,「 High availability 」 I can't stand it anymore ！

The scene of the accident

System deployment diagram

Reasons for error reporting and solutions

Why there's a problem ？

improvement

边栏推荐

猜你喜欢

随机推荐