当前位置:网站首页>An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
2022-06-24 14:40:00 【Wukong chat architecture】
Keep creating , Accelerate growth ! This is my participation 「 Nuggets day new plan · 6 Yuegengwen challenge 」 Of the 9 God , Click to see the event details
once MySQL Accidents caused by misoperation ,「 High availability 」 I can't stand it anymore !
This is Wukong's first 152 Original articles
Official website :www.passjava.cn
Hello , I'm Wukong .
Last time our project did not put MySQL Is the high availability deployment ready ,MySQL Dual master mode + Keepalived, To ensure high availability . Simply put, there are two MySQL Master node , There are two Keepalived Installed on the host computer to monitor MySQL The state of , Once problems are found , Just restart MySQL, And the client will automatically connect to another computer MySQL.
For details, please see this article written by Wukong : actual combat MySQL High availability Architecture
This is an accident we encountered in the project , Let's go over it .
The contents of this article are as follows :
The scene of the accident
- Environmental Science : Test environment
- Time : In the morning 10:30
- Feedback personnel : Test group , There's a frying pan , After preliminary investigation by R & D colleagues , It is found that there may be a database problem .
Then start looking for reasons . Because this cluster environment is deployed by me , So if I came to check, I was familiar with it .
System deployment diagram
First, let's talk about the deployment diagram of the system , So that you can understand .
Two databases are deployed in node55 and node56 Node , They are in a master-slave relationship with each other , So it is called double master .
There are two Keepalived Deployed in node55 and node56 above , Separately monitor MySQL Container state .
Reasons for error reporting and solutions
- ① My first thought was , Not having Keepalived To ensure high availability , Even if MySQL Hang up , It can also be done through Keepalived To restart automatically . Even if one fails to restart , There is another one that can be used ?
- ② Then go to the server and have a look MySQL The state of the container . To MySQL On two servers of , Let's take a look at MySQL Container state ,docker ps command , Found two MySQL Containers are not in the list , This means that the container is not functioning properly .
- ③ It's impossible , I installed Keepalived High availability components , Don't Keepalived I've also hung up ?
- ④ Check the wave quickly Keepalived, Found two Keepalived It works . View by executing the command :systemctl status keepalived
- ⑤ what ,Keepalived It's normal , Keepalived It will restart every few seconds MySQL, Maybe I didn't see it in that short free time MySQL Container start up ? Execute another command ,docker ps -a, List the status of all containers . You can see MySQL Started and exited , explain MySQL It's really rebooting .
- ⑥ That means Keepalived Although it was restarted MySQL Containers , however MySQL I have a problem with myself , that Keepalived There is no way to improve the high availability of .
- ⑦ Then how to fix ? Just look at MySQL Report something wrong . Execute the command to view the container log .docker logs < Containers id>. Find the recent log :
- ⑧ Tips mysql-bin.index file does not exist , This file is configured for master-slave synchronization , stay my.cnf In the configuration .
After this configuration , Then, when performing master-slave synchronization , Will be in var/lib/mysql/log Multiple... Are generated under the directory mysql-bin.xxx The file of . One more mysql-bin.index Index file , It will mark now binlog Where are the log files recorded .
mysql-bin.index The contents of the document are as follows :
/var/lib/mysql/log/mysql-bin.000001
This mysql-bin.000001 The document is still numbered , There's still a pit here , I'll talk about it later .
⑨ The error message indicates that there is a lack of mysql-bin.index, Let's check it out , Not really ! No matter how the file disappeared , Get this log Create the folder first , then mysql It will automatically generate this file for us .
Solution : Execute the following command to create a folder and add permissions .
mkdir logchmod 777 log -R
⑩ This is available on both servers log After the directory ,Keepalived Also help us restart automatically MySQL Containers , Then visit one of the nodes node56 Of MySQL The state of , Why , It's the wrong report .
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
You can see several key messages :
- Slave_IO_Running: NO, Currently synchronized I/O The thread is not running , This I/O Threads are from the library , It will request the master database binlog, And will get binlog Write local relay-log ( relay logs ) In file . Is not running , It means that the slave database synchronization is not running normally .
- Master_Log_File: mysql-bin.000014, This indicates that the currently synchronized log file is
000014, We saw the node before node56 On mysql.index It says 000001, This 000014 Not at all index In the document , So it will report an error .
This involves the principle of master-slave synchronization , Previous picture :
Two threads are generated from the library , One I/O Threads , One SQL Threads ;
I/O The thread will request the main library binlog Log files , And will get binlog Log files Write local relay-log ( relay logs ) In file ;
The main library will generate a dump Threads , Used to give to the slave I/O Thread transfer binlog;
SQL Threads , Will read relay log Log in file , And resolved into SQL Statements are executed one by one .
That right , We re specify which log file to synchronize , And the location of synchronization .
Solution :
Look at the main library node55 Log file status on .
Write down these two messages :File=mysql-bin.00001,Position=117748.( There's also a hole here : First lock the watch , Look at these two values , After starting synchronization from the library , Unlock the table ).
The specific orders are as follows :
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS
UNLOCK TABLES
Then from the library node56 Reassign the synchronized log file and location on the :
# Stop synchronizing from the Library STOP SLAVE;# Set sync file and location CHANGE MASTER TO MASTER_HOST='10.2.1.55',MASTER_PORT=3306,MASTER_USER='vagrant',MASTER_PASSWORD='vagrant',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=117748;# Turn on synchronization START SLAVE;
Check again and no error will be reported ,I/O The thread also runs ,
Insert picture description here
And then node55 As a slave Library ,node56 As the master library , Also perform the above steps , The status display is normal , And then use navicat Connect the tool to the database , It's all normal , Under the feedback of the test group , Fix up the work .
I seem to have forgotten a question , Why? log The folder was wiped out ??
Why there's a problem ?
Then I asked if anyone had deleted this at that time /var/lib/mysql/log Catalog , No one will delete this directory casually .
But found log The parent directory of /var/lib/mysql There are many other folders , such as xxcloud, xxcenter etc. . This is the name of several databases in our project , Just in the folder of this directory , Will be shown in navicat On , It's one-to-one , As shown in the figure below . It also shows log database .
Is there anyone from navicat I got rid of it log database ? Very likely !
Sure enough , A colleague was in the process of migration and upgrading , Found this log There is no database in the old system , So I cleaned it up , This is equivalent to log The database is down , At the same time, I will also put log The folder is gone . Okay , Finally, it's all over the place ! In fact, I didn't consider this in the early stage log A problem with the catalog . you 're right , This is my pot ~
improvement
Actually, when you synchronize the database , This should not be used to override synchronization , Single database synchronization can be adopted , Will not kill log Database . however , This log It's a little strange to put the database here , Can you not show up here ?
We just specify this log The directory is not in /var/lib/mysql It's just in the catalog .
Dongge suggested :log Files and databases data File isolation :
datadir = /var/lib/mysql/data
log_bin = /var/lib/mysql/log
Another question , Is our high availability really high ?
At least they didn't call the police in time ,MySQL Database hung , I don't know , They are all feedback from students through tests .
Can you feel it in time MySQL Unusual ?
It can be used here Keepalived The function of sending e-mail , Or through the log alarm system . This is what needs to be improved in the future .
- END -
边栏推荐
- Method after charging the idea plug-in material theme UI
- 10_ Those high-profile personal signatures
- 如何避免下重复订单
- 时间同步业务的闭环管理——时间监测
- Common singleton mode & simple factory
- R语言plotly可视化:使用plotly可视化数据划分后的训练集和测试集、使用不同的形状标签表征、训练集、测试集、以及数据集的分类标签(Display training and test split
- Six stones Management: garbage dump effect: if you don't manage your work, you will become a garbage dump
- 同样是初级测试工程师,为啥他薪资高?会这几点面试必定出彩
- How to implement redis cache of highly paid programmers & interview questions series 115? How do I find a hot key? What are the possible problems with caching?
- One article to get UDP and TCP high-frequency interview questions!
猜你喜欢

In the eyes of the universe, how to correctly care about counting East and West?

数字臧品系统开发 NFT数字臧品系统异常处理源码分享

IDEA连接mysql自定义生成实体类代码

STM32F1与STM32CubeIDE编程实例-WS2812B全彩LED驱动(基于SPI+DMA)

Successfully solved: selenium common. exceptions. SessionNotCreatedException: Message: session not created: This versi

Laravel 8 realizes auth login

作为一名开发者,对你影响最深的书籍是哪一本?

box-sizing

ES mapping之keyword;term查询添加keyword查询;更改mapping keyword类型
![[bitbear story collection] June MVP hero story | technology practice collision realm thinking](/img/b7/ca2f8cfb124e7c68da0293624911d1.png)
[bitbear story collection] June MVP hero story | technology practice collision realm thinking
随机推荐
SSH keygen configuration does not require entering a password every time
Common sense knowledge points
缓存使用中Redis,Memcached的共性和差异分析
The "little giant" specialized in special new products is restarted, and the "enterprise cloud" digital empowerment
Defeat the binary tree!
不要小看了积分商城,它的作用可以很大
Alibaba OSS object storage service
Golang implements BigInteger large number calculation
update+catroot+c000021a+critical service failed+drivers+intelide+viaide+000000f
3环杀掉360安全卫士进程
leetcode 139. Word Break 单词拆分(中等)
[bitbear story collection] June MVP hero story | technology practice collision realm thinking
Common singleton mode & simple factory
I have been in the industry for 4 years and have changed jobs twice. I have learned a lot about software testing
【ansible问题处理】远程执行用户环境变量加载问题
IDEA 插件 Material Theme UI收费后的办法
09_ An efficient memory method
个人如何开户炒股 炒股开户安全吗
Laravel 8 realizes auth login
pgsql查询分组中某个字段最大或者最小的一条数据