当前位置:网站首页>Iptables cause heartbeat brain fissure
Iptables cause heartbeat brain fissure
2022-07-04 11:02:00 【Brother Xing plays with the clouds】
Will be heartbeat Apply to the production environment , There are still many things to pay attention to , Carelessness may lead to heartbeat Unable to switch or brain crack , Now let's introduce the reason iptables The phenomenon that causes brain cracking .
Lord :192.168.3.218
192.168.4.218 heartbeat ip
usvr-218 Host name
To prepare :192.168.3.128
192.168.4.128 heartbeat ip
usvr-128 Host name
The phenomenon : When to start heartbeat After the Lord ,VIP stay 218 Effective on ; And then it starts heartbeat To prepare ,VIP stay 128 Also effective on ; At this time, the brain fissure produces , Cause access exception .
Solutions :
1. Check the logs of the host and standby
host 218 The log is as follows ( Only some logs are listed ):
heartbeat[27330]: 2015/01/27_09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:33 WARN: node usvr-128: is dead
heartbeat[27330]: 2015/01/27_09:05:33 info: Cancelling pending standby operation
heartbeat[27330]: 2015/01/27_09:05:33 info: Dead node usvr-128 gave up resources.
heartbeat[27330]: 2015/01/27_09:05:33 info: all clients are now resumed
heartbeat[27330]: 2015/01/27_09:05:33 ERROR: lowseq cannnot be greater than ackseq
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->ackseq =74575, old_ackseq=0
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27330 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27334 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27335 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27336 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27337 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
Standby machine 128 The log is as follows ( Only some logs are listed ):
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT(w)
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'
Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1:192.168.3.1 up.
Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping
Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead
Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up(): updating status to active
Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'
Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)
Jan 27 10:13:35 heartbeat: [15999]: WARN: No STONITH device configured.
Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.
Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.
As shown above , Both sides check each other's node Die , To take over VIP, Lead to brain fissure .
2. It is preliminarily concluded that it is caused by the failure of communication between the active and standby parties or network delay , Is it because the time is not synchronized , Although the time is different, it's wrong heartbeat The impact is small , But there is a lot of difference , There are bound to be problems , So both sides time .
/usr/sbin/ntpdate ntp.api.bz&&hwclock -w
echo "0 23 * * * root /usr/sbin/ntpdate ntp.api.bz&&hwclock -w > /dev/null 2>&1" >>/etc/crontab
3. The timing is over , Still report the error in the log , Check the active and standby configuration files again , No problem found , The only difference is that there are firewalls on both the active and standby , because heartbeat Set by udp 694 Port communication , So will udp 694
Let the port pass in the fire wall .
In the main 218 Add on :
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.128 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT
In preparation 128 Add on :
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.218 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT
Be careful :1. If the firewall policy is strict , To beat your heart ip Let go of , otherwise udp Communication will still fail
2. The entrance network card is aimed at the heartbeat ip Network card of
After firewall configuration , The active and standby can communicate normally , Under normal circumstances, the master node takes over VIP Work , When the master node down Drop or master node heartbeat Service stopped , The standby node will take over VIP
边栏推荐
- 试题库管理系统–数据库设计[通俗易懂]
- Simple understanding of string
- 2022 AAAI fellow release! Yan Shuicheng, chief scientist of sail, and Feng Yan, Professor of Hong Kong University of science and technology, were selected
- [Galaxy Kirin V10] [server] NFS setup
- 20 minutes to learn what XML is_ XML learning notes_ What is an XML file_ Basic grammatical rules_ How to parse
- Usage of with as
- [Galaxy Kirin V10] [server] system startup failed
- MFC document view framework (relationship between classes)
- Postman advanced
- Write a program to judge whether the elements contained in a vector < int> container are 9.20: exactly the same as those in a list < int> container.
猜你喜欢

Usage of case when then else end statement

JMeter assembly point technology and logic controller

Error C4996 ‘WSAAsyncSelect‘: Use WSAEventSelect() instead or define _ WINSOCK_ DEPRECATED_ NO_ WARN
![[Galaxy Kirin V10] [desktop] can't be started or the screen is black](/img/68/735d80c648f4a8635513894c473860.jpg)
[Galaxy Kirin V10] [desktop] can't be started or the screen is black

shell awk

Canoe test: two ways to create CAPL test module

Huge number (C language)
![[Galaxy Kirin V10] [server] set time synchronization of intranet server](/img/f8/0847314029930840c91bce97ccd961.jpg)
[Galaxy Kirin V10] [server] set time synchronization of intranet server

How to use diff and patch to update the source code

VI text editor and user rights management, group management and time management
随机推荐
SSH原理和公钥认证
Network connection (II) three handshakes, four waves, socket essence, packaging of network packets, TCP header, IP header, ACK confirmation, sliding window, results of network packets, working mode of
[Galaxy Kirin V10] [server] soft RAID configuration
The last month before a game goes online
2022 AAAI fellow release! Yan Shuicheng, chief scientist of sail, and Feng Yan, Professor of Hong Kong University of science and technology, were selected
On binary tree (C language)
2、 Operators and branches
Installation of ES plug-in in Google browser
[Galaxy Kirin V10] [desktop] build NFS to realize disk sharing
Ten key performance indicators of software applications
/*Write a function to open the file for input, read the contents of the file into the vector container of string class 8.9: type, and store each line as an element of the container object*/
Recursive method to achieve full permutation (C language)
Canoe test: two ways to create CAPL test module
software test
Canoe - the second simulation engineering - xvehicle - 2 panel design (operation)
JMeter assembly point technology and logic controller
Oracle11g | getting started with database. It's enough to read this 10000 word analysis
The most ideal automated testing model, how to achieve layering of automated testing
How to deal with the relationship between colleagues
[Galaxy Kirin V10] [server] grub default password