当前位置:网站首页>Iptables cause heartbeat brain fissure
Iptables cause heartbeat brain fissure
2022-07-04 11:02:00 【Brother Xing plays with the clouds】
Will be heartbeat Apply to the production environment , There are still many things to pay attention to , Carelessness may lead to heartbeat Unable to switch or brain crack , Now let's introduce the reason iptables The phenomenon that causes brain cracking .
Lord :192.168.3.218
192.168.4.218 heartbeat ip
usvr-218 Host name
To prepare :192.168.3.128
192.168.4.128 heartbeat ip
usvr-128 Host name
The phenomenon : When to start heartbeat After the Lord ,VIP stay 218 Effective on ; And then it starts heartbeat To prepare ,VIP stay 128 Also effective on ; At this time, the brain fissure produces , Cause access exception .
Solutions :
1. Check the logs of the host and standby
host 218 The log is as follows ( Only some logs are listed ):
heartbeat[27330]: 2015/01/27_09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:33 WARN: node usvr-128: is dead
heartbeat[27330]: 2015/01/27_09:05:33 info: Cancelling pending standby operation
heartbeat[27330]: 2015/01/27_09:05:33 info: Dead node usvr-128 gave up resources.
heartbeat[27330]: 2015/01/27_09:05:33 info: all clients are now resumed
heartbeat[27330]: 2015/01/27_09:05:33 ERROR: lowseq cannnot be greater than ackseq
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->ackseq =74575, old_ackseq=0
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27330 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27334 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27335 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27336 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27337 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
Standby machine 128 The log is as follows ( Only some logs are listed ):
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT(w)
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'
Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1:192.168.3.1 up.
Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping
Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead
Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up(): updating status to active
Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'
Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)
Jan 27 10:13:35 heartbeat: [15999]: WARN: No STONITH device configured.
Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.
Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.
As shown above , Both sides check each other's node Die , To take over VIP, Lead to brain fissure .
2. It is preliminarily concluded that it is caused by the failure of communication between the active and standby parties or network delay , Is it because the time is not synchronized , Although the time is different, it's wrong heartbeat The impact is small , But there is a lot of difference , There are bound to be problems , So both sides time .
/usr/sbin/ntpdate ntp.api.bz&&hwclock -w
echo "0 23 * * * root /usr/sbin/ntpdate ntp.api.bz&&hwclock -w > /dev/null 2>&1" >>/etc/crontab
3. The timing is over , Still report the error in the log , Check the active and standby configuration files again , No problem found , The only difference is that there are firewalls on both the active and standby , because heartbeat Set by udp 694 Port communication , So will udp 694
Let the port pass in the fire wall .
In the main 218 Add on :
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.128 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT
In preparation 128 Add on :
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.218 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT
Be careful :1. If the firewall policy is strict , To beat your heart ip Let go of , otherwise udp Communication will still fail
2. The entrance network card is aimed at the heartbeat ip Network card of
After firewall configuration , The active and standby can communicate normally , Under normal circumstances, the master node takes over VIP Work , When the master node down Drop or master node heartbeat Service stopped , The standby node will take over VIP
边栏推荐
- shell awk
- Heartbeat启动后无反应
- Canoe - the third simulation project - bus simulation - 3-1 project implementation
- Simple understanding of generics
- Add t more space to your computer (no need to add hard disk)
- Post man JSON script version conversion
- Canoe-the second simulation project-xvehicle-1 bus database design (idea)
- If function in SQL
- Capl: timer event
- Using SA token to solve websocket handshake authentication
猜你喜欢
Send a request using paste raw text
Huge number multiplication (C language)
Function introduction of canbedded component
20 kinds of hardware engineers must be aware of basic components | the latest update to 8.13
[machine] [server] Taishan 200
20 minutes to learn what XML is_ XML learning notes_ What is an XML file_ Basic grammatical rules_ How to parse
Elevator dispatching (pairing project) ②
Digital simulation beauty match preparation -matlab basic operation No. 6
Replace() function
Fundamentals of software testing
随机推荐
Solaris 10网络服务
Network connection (II) three handshakes, four waves, socket essence, packaging of network packets, TCP header, IP header, ACK confirmation, sliding window, results of network packets, working mode of
Canoe: distinguish VT, VN and vteststudio from their development history
regular expression
Performance features focus & JMeter & LoadRunner advantages and disadvantages
Locust learning record I
Canoe - description of common database attributes
[Galaxy Kirin V10] [desktop] printer
PHP programming language (1) - operators
[Galaxy Kirin V10] [server] NUMA Technology
First article
Hidden C2 tunnel -- use of icmpsh of ICMP
Analysis function in SQL
20 kinds of hardware engineers must be aware of basic components | the latest update to 8.13
Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool
Personal thoughts on the development of game automation protocol testing tool
[Galaxy Kirin V10] [desktop] cannot add printer
Const's constant member function after the function; Form, characteristics and use of inline function
Failed to configure a DataSource: ‘url‘ attribute is not specified... Bug solution
Using Lua to realize 99 multiplication table