当前位置:网站首页>iptables导致Heartbeat脑裂
iptables导致Heartbeat脑裂
2022-07-04 10:55:00 【星哥玩云】
在将heartbeat应用到生产环境中,还是有许多要注意的地方,一不小心就可能导致heartbeat无法切换或脑裂的情况,下面来介绍下由于iptables导致脑裂的现象。
主:192.168.3.218
192.168.4.218 心跳ip
usvr-218 主机名
备:192.168.3.128
192.168.4.128 心跳ip
usvr-128 主机名
现象:当启动heartbeat主后,VIP在218上生效;然后再启动heartbeat备,VIP在128上也生效;此时脑裂产生,导致访问异常。
解决思路:
1.查看主机和备机的日志
主机218日志如下(只列出部分日志):
heartbeat[27330]: 2015/01/27_09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)
heartbeat[27330]: 2015/01/27_09:05:33 WARN: node usvr-128: is dead
heartbeat[27330]: 2015/01/27_09:05:33 info: Cancelling pending standby operation
heartbeat[27330]: 2015/01/27_09:05:33 info: Dead node usvr-128 gave up resources.
heartbeat[27330]: 2015/01/27_09:05:33 info: all clients are now resumed
heartbeat[27330]: 2015/01/27_09:05:33 ERROR: lowseq cannnot be greater than ackseq
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->ackseq =74575, old_ackseq=0
heartbeat[27330]: 2015/01/27_09:05:33 info: hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27330 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27334 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27335 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27336 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27337 with SIGTERM
heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
备机128日志如下(只列出部分日志):
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT(w)
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218
Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'
Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1:192.168.3.1 up.
Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping
Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead
Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up(): updating status to active
Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'
Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)
Jan 27 10:13:35 heartbeat: [15999]: WARN: No STONITH device configured.
Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.
Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.
正如如上显示,主备双方都检查对方的node死掉,从而接管VIP,导致脑裂产生。
2.初步断定是由于主备双方无法通讯或网络延迟导致,难道由于时间不同步导致,虽然时间不同不对heartbeat影响较小,但是相差很多,肯定会有问题,于是双方对时。
/usr/sbin/ntpdate ntp.api.bz&&hwclock -w
echo "0 23 * * * root /usr/sbin/ntpdate ntp.api.bz&&hwclock -w > /dev/null 2>&1" >>/etc/crontab
3.对时完毕,仍然报日志中的错误,再次检查主备配置文件,发现都没有问题,唯一区别在于主备上都有防火墙,由于heartbeat设置的是由udp 694端口通讯,于是将udp 694
端口在放火墙中放过。
在主218上加入:
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.128 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT
在备128上加入:
/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.218 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT
注意:1.如果防火墙策略严格时,要对心跳ip放过,否则udp通讯仍会失败
2.入口网卡针对对心跳ip的网卡
经过防火墙配置后,主备可以正常通讯了,正常情况下主节点接管VIP工作,当主节点down掉或主节点的heartbeat服务停掉,备用节点便会接管VIP
边栏推荐
- Dichotomy search (C language)
- 2022 ape circle recruitment project (software development)
- [Galaxy Kirin V10] [server] iSCSI deployment
- Canoe the second simulation engineering xvehicle 3 CAPL programming (operation)
- Canoe - the second simulation engineering - xvehicle - 2 panel design (operation)
- Discussion | has large AI become autonomous? Lecun, chief scientist of openai
- Hidden C2 tunnel -- use of icmpsh of ICMP
- When I forget how to write SQL, I
- Static comprehensive experiment ---hcip1
- Appscan installation error: unable to install from Net runtime security policy logout appscan solution
猜你喜欢
What if the book written is too popular? Author of "deep reinforcement learning" at Peking University: then open the download
[Galaxy Kirin V10] [desktop] FTP common scene setup
Jemeter script recording
Introduction to canoe automatic test system
shell awk
JMeter correlation technology
Knapsack problem and 0-1 knapsack problem
IPv6 comprehensive experiment
MFC document view framework (relationship between classes)
Summary of several job scheduling problems
随机推荐
BGP advanced experiment
[machine] [server] Taishan 200
Collection of practical string functions
C language - stack
Write a program that uses pointers to set all elements of an int array to 4.18: 0.
Jemeter plug-in technology
Dictionaries and collections
Write a program to judge whether the elements contained in a vector < int> container are 9.20: exactly the same as those in a list < int> container.
Fundamentals of database operation
Huge number (C language)
Summary of automated testing framework
Advanced order of function
unit testing
Elevator dispatching (pairing project) ④
Oracle11g | getting started with database. It's enough to read this 10000 word analysis
Performance features focus & JMeter & LoadRunner advantages and disadvantages
JMeter assembly point technology and logic controller
How do microservices aggregate API documents? This wave of show~
R built in data set
[Galaxy Kirin V10] [server] failed to start the network