当前位置:网站首页>ipvs 导致syn 重传问题
ipvs 导致syn 重传问题
2022-06-28 05:38:00 【Mrpre】
(本篇讲的是IPVS自身导致的syn重传,和RS的PAWS没关系)
sysctl_conn_reuse_mode 为0 时,如果当前IPVS的session是TIME_WAIT或者其他状态,任何进来的packet都会根据当前的session进行转发。这就造成一个问题,如果客户端请求都是短连接,
那么IPVS的session都是TIME_WAIT状态,如果客户端来了一个SYN包,那么SYN包也会将当前的session变成SYN_RECV,然后不会重新调度RS。特别是,如果我们将RS权重为0,此时一个SYN过来时,再调度到权重为0的RS,非常不合理。
sysctl_conn_reuse_mode为1时,表示在一定条件下会强制调度,所谓强制调度,就是释放当前session,然后重新负载均衡。
核心一点就是 sysctl_conn_reuse_mode 为1的意思是,强制调度,和“reuse”的意思相反,这个需要注意。
conn_reuse_mode - INTEGER
1 - default
Controls how ipvs will deal with connections that are detected
port reuse. It is a bitmap, with the values being:
0: disable any special handling on port reuse. The new
connection will be delivered to the same real server that was
servicing the previous connection. This will effectively
disable expire_nodest_conn.
bit 1: enable rescheduling of new connections when it is safe.
That is, whenever expire_nodest_conn and for TCP sockets, when
the connection is in TIME_WAIT state (which is only possible if
you use NAT mode).
bit 2: it is bit 1 plus, for TCP connections, when connections
are in FIN_WAIT state, as this is the last state seen by load
balancer in Direct Routing mode. This bit helps on adding new
real servers to a very busy cluster.
来看下IPVS处理SYN包的逻辑
/* * Check if it's for virtual services, look it up, * and send it on its way... */
static unsigned int
ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int af)
{
....
conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
//开启了reuse,并且未分片、并且请求时syn,并且已经存在session
if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
bool uses_ct = false, resched = false;
//如果rs权重为0,并且配置了sysctl_expire_nodest_conn,进行调度,这是必然的,权重是0了,就不要再转发这个请求了
if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
unlikely(!atomic_read(&cp->dest->weight))) {
resched = true;
uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
//其他情况,当前rs的权重不为0,只要当前session的状态时timewait或者finwait,也能重新调度
uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
if (!atomic_read(&cp->n_control)) {
resched = true;
} else {
/* Do not reschedule controlling connection * that uses conntrack while it is still * referenced by controlled connection(s). */
resched = !uses_ct;
}
}
if (resched) {
if (!atomic_read(&cp->n_control))
ip_vs_conn_expire_now(cp);
__ip_vs_conn_put(cp);
if (uses_ct)
return NF_DROP;
cp = NULL;
}
}
逻辑
- 非分片的SYN命中已经存在的session,才考虑是否“reschedule”
- 如果当前session的rs权重为0(被修改),并且配置了sysctl_expire_nodest_conn,进行调度,这是必然的,权重是0了,就不要再转发这个请求了,并且强制老化ipvs的session
- 如果不满足条件2,那么只要不是和FTP相关的,也能复用,这部分逻辑设计FTP,这里不展开。
总的来说,开了 conn_reuse_mode ,尽量调度(resched),调度就是老化ipvs原先的session。
但是上面有一个问题,如果IPVS是masquerading并且需要靠iptables进行做snat,uses_ct 为真(nf_conntrack在ipvs之前),ipvs的session老化掉,但是当前的SYN包也会被丢弃,丢弃的目的是返回NF_DROP 告诉nf_conntrack删除nf_conntrack会话 。
核心问题在于,通过iptables 做snat时,ipvs有2个session,一个是自己模块的,一个是nf_conntrack模块的,自己没办法可控,所以通过丢弃SYN返回NF_DROP 来触发nf_conntrack的session释放。
这就导致的SYN包会重传,第二次SYN包过来时,相当于新建session,这本质和resched背道而驰。
很容易复现该问题 客户端使用 curl http://vip:vport/ --local-port 60362 指定端口 rs是一个http例如nginx即可,连续访问2次(注意客户端需要开启tcp_reuse,有可能还得开启tcp_recycle,避免timewait时不能复用端口)。
有2个相关的IPVS的patch解决or优化了上述问题
- http://patchwork.ozlabs.org/project/netfilter-devel/patch/[email protected]/
- http://patchwork.ozlabs.org/project/netfilter-devel/patch/[email protected]/
第一个patch解决了timewait时或者其他close状态时,来syn包被丢弃的问题,核心是增加了ip_vs_conn_uses_old_conntrack 的判断,判断如果nf_conntrack 是 unconfirmed,就允许被调度。
static unsigned int
ip_vs_in_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state)
{
/* * Check if the packet belongs to an existing connection entry */
cp = INDIRECT_CALL_1(pp->conn_in_get, ip_vs_conn_in_get_proto,
ipvs, af, skb, &iph);
if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) {
int conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
bool old_ct = false, resched = false;
if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
unlikely(!atomic_read(&cp->dest->weight))) {
resched = true;
old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
} else if (conn_reuse_mode &&
is_new_conn_expected(cp, conn_reuse_mode)) {
old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
if (!atomic_read(&cp->n_control)) {
resched = true;
} else {
/* Do not reschedule controlling connection * that uses conntrack while it is still * referenced by controlled connection(s). */
resched = !old_ct;
}
}
if (resched) {
if (!old_ct)
cp->flags &= ~IP_VS_CONN_F_NFCT;
if (!atomic_read(&cp->n_control))
ip_vs_conn_expire_now(cp);
__ip_vs_conn_put(cp);
if (old_ct)
return NF_DROP;
cp = NULL;
}
}
上面有一个优化逻辑,就是无论 reuse开没开,如果 rs的权重为0,就强制老化,这很合理,总不能权重为0了,还不调度。
还有一点很关键,一个confirmed的 nf_conntrack 指的是 第一个请求包(通常是SYN)被转发出到RS了,那么上面的情况,nf_conntrack 此时应该也是timewait状态,必然见过了N次交互,理论上它是confirmed,ipvs这么判断不就不生效了么,还是会返回NF_DROP让syn重传?
这就涉及到nf_conntrack 有自己的session逻辑以及timewait时来syn包的逻辑。
由于nf_conntrack 的逻辑提前于IPVS,如下所示,如果第一次完成三次握手和四次挥手,那么old_state必然是timewait的,那么释放当前session,然后走NF_REPEAT,重入协议栈,新建nf_conntrack的session它就是unconfirmed的nf_conntrack_in->nf_conntrack_tcp_packet
/* Returns verdict for packet, or -1 for invalid. */
int nf_conntrack_tcp_packet(struct nf_conn *ct,
struct sk_buff *skb,
unsigned int dataoff,
enum ip_conntrack_info ctinfo,
const struct nf_hook_state *state)
{
...
if (!nf_ct_is_confirmed(ct) && !tcp_new(ct, skb, dataoff, th))
return -NF_ACCEPT;
spin_lock_bh(&ct->lock);
old_state = ct->proto.tcp.state;
dir = CTINFO2DIR(ctinfo);
index = get_conntrack_index(th);
new_state = tcp_conntracks[dir][index][old_state];
tuple = &ct->tuplehash[dir].tuple;
switch (new_state) {
case TCP_CONNTRACK_SYN_SENT:
if (old_state < TCP_CONNTRACK_TIME_WAIT)
break;
/* RFC 1122: "When a connection is closed actively, * it MUST linger in TIME-WAIT state for a time 2xMSL * (Maximum Segment Lifetime). However, it MAY accept * a new SYN from the remote TCP to reopen the connection * directly from TIME-WAIT state, if..." * We ignore the conditions because we are in the * TIME-WAIT state anyway. * * Handle aborted connections: we and the server * think there is an existing connection but the client * aborts it and starts a new one. */
if (((ct->proto.tcp.seen[dir].flags
| ct->proto.tcp.seen[!dir].flags)
& IP_CT_TCP_FLAG_CLOSE_INIT)
|| (ct->proto.tcp.last_dir == dir
&& ct->proto.tcp.last_index == TCP_RST_SET)) {
/* Attempt to reopen a closed/aborted connection. * Delete this connection and look up again. */
spin_unlock_bh(&ct->lock);
/* Only repeat if we can actually remove the timer. * Destruction may already be in progress in process * context and we must give it a chance to terminate. */
if (nf_ct_kill(ct))
return -NF_REPEAT;
return NF_DROP;
}
fallthrough;
......
}
换句话说,IPVS上面的判断confirm的逻辑,是基于nf_conntrack 对于一个syn包,能够释放->新建 自己原先的nf_conntrack的 session
边栏推荐
- 联想混合云Lenovo xCloud,新企业IT服务门户
- Concurrent wait/notify description
- Interpretation of cloud native microservice technology trend
- When using the MessageBox of class toplevel, a problem pops up in the window.
- Steve Jobs' speech at Stanford University -- follow your heart
- [JVM] - memory partition in JVM
- Detailed usage configuration of the shutter textbutton, overview of the shutter buttonstyle style and Practice
- Data warehouse: DWS layer design principle
- sklearn 特征工程(总结)
- 8VC Venture Cup 2017 - Elimination Round D. PolandBall and Polygon
猜你喜欢

Share a powerful tool for factor Mining: genetic programming

WordPress zibll sub theme 6.4.1 happy version is free of authorization

codeforces每日5题(均1700)

电商转化率这么抽象,到底是个啥?

jsp连接oracle实现登录注册(简单)

Deeplearning ai-week1-quiz

【Linux】——使用xshell在Linux上安装MySQL及实现Webapp的部署

What does mysql---where 1=1 mean
![[C language practice - printing hollow square and its deformation]](/img/59/9122a6c8437f12bc28c97304ba9787.png)
[C language practice - printing hollow square and its deformation]

Sqlmap tool user manual
随机推荐
双向电平转换电路
Disable right-click, keyboard open console events
Important basis for ERP software company selection
PS effect understanding record 2 color_ dodge color_ burn
一看就会 MotionLayout使用的几种方式
如何在您的Shopify商店中添加实时聊天功能?
RL 实践(0)—— 及第平台辛丑年冬赛季【Rule-based policy】
Line animation
What are functions in C language? What is the difference between functions in programming and functions in mathematics? Understanding functions in programming languages
Lhasa accordion
Introduction to uicollectionviewdiffabledatasource and nsdiffabledatasourcesnapshot
Linked list in JS (including leetcode examples) < continuous update ~>
Flink 窗口机制 (两次等待, 最后兜底)
Filecoin黑客松开发者大赛
【MYSQL】所有查询表中有2千万数据--sql如何优化
数据中台:六问数据中台
Error: the following arguments are required:
数据中台:数据治理的建设思路以及落地经验
Typescript interface
Gee learning notes 3- export table data