当前位置:网站首页>iptables导致Heartbeat脑裂
iptables导致Heartbeat脑裂
2022-07-04 10:55:00 【星哥玩云】
在将heartbeat应用到生产环境中,还是有许多要注意的地方,一不小心就可能导致heartbeat无法切换或脑裂的情况,下面来介绍下由于iptables导致脑裂的现象。
主:192.168.3.218
192.168.4.218 心跳ip
usvr-218 主机名
备:192.168.3.128
192.168.4.128 心跳ip
usvr-128 主机名
现象:当启动heartbeat主后,VIP在218上生效;然后再启动heartbeat备,VIP在128上也生效;此时脑裂产生,导致访问异常。
解决思路:
1.查看主机和备机的日志
主机218日志如下(只列出部分日志):
heartbeat[27330]: 2015/01/27_09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:33 WARN: node usvr-128: is dead
heartbeat[27330]: 2015/01/27_09:05:33 info: Cancelling pending standby operation
heartbeat[27330]: 2015/01/27_09:05:33 info: Dead node usvr-128 gave up resources.
heartbeat[27330]: 2015/01/27_09:05:33 info: all clients are now resumed
heartbeat[27330]: 2015/01/27_09:05:33 ERROR: lowseq cannnot be greater than ackseq
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->ackseq =74575, old_ackseq=0
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27330 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27334 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27335 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27336 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27337 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
备机128日志如下(只列出部分日志):
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT(w)
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'
Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1:192.168.3.1 up.
Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping
Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead
Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up(): updating status to active
Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'
Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)
Jan 27 10:13:35 heartbeat: [15999]: WARN: No STONITH device configured.
Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.
Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.
正如如上显示,主备双方都检查对方的node死掉,从而接管VIP,导致脑裂产生。
2.初步断定是由于主备双方无法通讯或网络延迟导致,难道由于时间不同步导致,虽然时间不同不对heartbeat影响较小,但是相差很多,肯定会有问题,于是双方对时。
/usr/sbin/ntpdate ntp.api.bz&&hwclock -w
echo "0 23 * * * root /usr/sbin/ntpdate ntp.api.bz&&hwclock -w > /dev/null 2>&1" >>/etc/crontab
3.对时完毕,仍然报日志中的错误,再次检查主备配置文件,发现都没有问题,唯一区别在于主备上都有防火墙,由于heartbeat设置的是由udp 694端口通讯,于是将udp 694
端口在放火墙中放过。
在主218上加入:
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.128 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT
在备128上加入:
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.218 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT
注意:1.如果防火墙策略严格时,要对心跳ip放过,否则udp通讯仍会失败
2.入口网卡针对对心跳ip的网卡
经过防火墙配置后,主备可以正常通讯了,正常情况下主节点接管VIP工作,当主节点down掉或主节点的heartbeat服务停掉,备用节点便会接管VIP
边栏推荐
- [Galaxy Kirin V10] [desktop] can't be started or the screen is black
- R built in data set
- The last month before a game goes online
- Remove linked list elements
- For and while loops
- Terms related to hacker technology
- Locust installation
- Failed to configure a DataSource: ‘url‘ attribute is not specified... Bug solution
- DDL statement of MySQL Foundation
- Four characteristics and isolation levels of database transactions
猜你喜欢

Jemeter script recording

2022 AAAI fellow release! Yan Shuicheng, chief scientist of sail, and Feng Yan, Professor of Hong Kong University of science and technology, were selected

Appscan installation error: unable to install from Net runtime security policy logout appscan solution

DDL statement of MySQL Foundation

Canoe: what is vtsystem

shell awk

Canoe - description of common database attributes
![[Galaxy Kirin V10] [desktop] can't be started or the screen is black](/img/68/735d80c648f4a8635513894c473860.jpg)
[Galaxy Kirin V10] [desktop] can't be started or the screen is black

Collection of practical string functions
![[Galaxy Kirin V10] [server] NFS setup](/img/ed/bd7f1a1e4924a615cb143a680a2ac7.jpg)
[Galaxy Kirin V10] [server] NFS setup
随机推荐
Performance test method
On binary tree (C language)
C language - stack
F12 clear the cookies of the corresponding web address
[untitled]
The most detailed teaching -- realize win10 multi-user remote login to intranet machine at the same time -- win10+frp+rdpwrap+ Alibaba cloud server
Software sharing: the best PDF document conversion tool and PDF Suite Enterprise version sharing | with sharing
How to quickly parse XML documents through C (in fact, other languages also have corresponding interfaces or libraries to call)
Network connection (II) three handshakes, four waves, socket essence, packaging of network packets, TCP header, IP header, ACK confirmation, sliding window, results of network packets, working mode of
Summary of several job scheduling problems
Canoe - the third simulation project - bus simulation-1 overview
Two way process republication + routing policy
Dynamic memory management
Elevator dispatching (pairing project) ①
Virtual machine configuration network
Canoe - the third simulation project - bus simulation - 2 function introduction, network topology
Canoe - the third simulation project - bus simulation - 3-2 project implementation
Strings and characters
Oracle11g | getting started with database. It's enough to read this 10000 word analysis
Error C4996 ‘WSAAsyncSelect‘: Use WSAEventSelect() instead or define _ WINSOCK_ DEPRECATED_ NO_ WARN