当前位置:网站首页>A series of problems caused by IPVS connection reuse in kubernetes
A series of problems caused by IPVS connection reuse in kubernetes
2022-06-24 15:14:00 【imroc】
This article excerpts from kubernetes Learning notes
background
stay Kubernetes There is a long discussed in the community bug (#81775), The problem is when client Yes service Initiate a large number of new projects TCP When the connection , The new connection is forwarded to Terminating Or have been completely destroyed Pod On , Cause continuous packet loss ( Report errors no route to host), The root cause is the kernel ipvs Connection multiplexing trigger , This article will break down in detail .
conn_reuse_mode brief introduction
Before introducing the reason , Let's introduce conn_reuse_mode This kernel parameter , It is the following two patch Introduced :
- year 2015 d752c364571743d696c2a54a449ce77550c35ac5
- year 2016 f719e3754ee2f7275437e61a6afd520181fdd43b
Its purpose is to :
- When
client ip:client portWhen multiplexing occurs , aboutTIME_WAITIn state ip_vs_conn, Reschedule , bring connection stay rs The distribution is more balanced , To improve performance . - If it's time to mode yes 0, Will reuse the old ip_vs_conn Inside rs, Make the connection more unbalanced .
So when conn_reuse_mode by 0 Means to enable ipvs Connection reuse , by 1 Means not to reuse , Is it a bit counter intuitive ? This is really controversial .
conn_resue_mode=1 Of bug
Turn on this kernel parameter (conn_reuse_mode=1) The purpose is to improve the performance of the new , The actual result is a significant reduction in performance , Found in actual test cps from 3w Down to 1.5K, This also shows some of the core community patch No strict performance test .
Turning on this kernel parameter actually means ipvs Connection multiplexing is not performed during forwarding , Each new connection is rescheduled rs And new ip_vs_conn, But there is a problem with its implementation : When creating a new connection (SYN package ), If client ip:client port It's a match ipvs Old connection (TIME_WIAT state ), And used conntrack, You lose the first one SYN package , Wait for retransmission (1s) To successfully build a company , This leads to a sharp decline in the performance of building and connecting .
Kubernetes The community also found this bug, So when kube-proxy Use ipvs In forwarding mode , By default conn_reuse_mode Set as 0 To avoid this problem , See PR #71114 And issue #70747 .
conn_resue_mode=0 Raised questions
because Kubernetes In order to avoid conn_resue_mode=1 Performance issues , stay ipvs In mode , Give Way kube-proxy On startup, the conn_resue_mode Set for 0 , That is to use ipvs The ability to reuse connections , but ipvs There are two problems with connection reuse :
- As long as there is
client ip:client portOn the match ip_vs_conn ( Reuse occurs ), Directly forward to the corresponding rs, No matter rs What is the current state , Even if rs Of weight by 0 ( UsuallyTIME_WAITstate ) Will also forward ,TIME_WAITOf rs Usually Terminating Status destroyed Pod, If you forward the past, the connection must be abnormal . - High concurrency delivers a large number of multiplexes , No scheduling for new connections rs, Directly forward to the corresponding... Of the multiplexed connection rs On , As a result, many new connections are " curing " To part rs On .
There may be many phenomena encountered in business :
- Rolling update connection exception . When the accessed service is scrolled ,Pod Some are newly built and some are destroyed ,ipvs When connection multiplexing occurs, it is forwarded to the destroyed Pod Cause abnormal connection (
no route to host). - Rolling update load is uneven . Because the connection will not be rescheduled during reuse , As a result, the new connection is also " curing " In some Pod Yes .
- Newly expanded Pod Receive less traffic . It is also because the connection will not be rescheduled during reuse , As a result, many new connections are " curing " Before capacity expansion Pod Yes .
Avoid scheme
We know the cause of the problem , So in ipvs How to avoid it in forwarding mode ? Let's consider from the north-south direction and the east-west direction respectively .
North South flow
- Use LB Straight through Pod. For the north-south flow , Usually rely on NodePort To expose , The previous load balancer turns the traffic to NodePort On , And then through ipvs Forward to the back end Pod. Now many cloud manufacturers support LB Straight through Pod, In this mode, the load balancer forwards the request directly to Pod, Not pass NodePort, There is no ipvs forward , Thus, this problem can be avoided in the traffic access layer .
- Use ingress forward . Deploy in the cluster ingress controller ( such as nginx ingress), The flow reaches ingress When you turn back ( Forward to... In the cluster Pod), Will not pass service forward , Instead, it is forwarded directly to service Corresponding
Pod IP:Port, It bypasses ipvs.Ingress controller Use the above mentioned in combination with itself LB Straight through Pod Mode deployment , The effect is better. .
East West flow
Inter service calls within the cluster ( East West flow ), By default, I will still go ipvs forward . For businesses with this high concurrency scenario , We can consider using Serivce Mesh ( Such as istio) To manage traffic , Inter service forwarding is performed by sidecar agent , And will not go through ipvs.
Ultimate solution : Kernel repair
conn_resue_mode=1 Cause performance degradation urgently bug, Currently, Tencent cloud provides TencentOS-kernel The open source kernel has been fixed , Corresponding PR #17, TKE The solution on is to use this kernel patch, Dependency disable ipvs Connection reuse (conn_resue_mode=1), In this way, the problem of ipvs A series of problems caused by connection reuse , And it has been verified by mass production .
However, the above fixes are not directly incorporated into linux Community , There are currently two related patch Merge into linux Kernel backbone ( since v5.9), Solve separately conn_resue_mode by 0 and 1 Above when bug, One of them also draws on the idea of Tencent cloud repair , See k8s issue #93297 .
If you use v5.9 The kernel above , Theoretically, there is no problem described in this article . since v5.9 The above kernel has fixed the above bug, that kube-proxy There is no need to explicitly set conn_resue_mode This kernel parameter , This is also PR #102122 does . But here's the thing , Community patch At present, there is no large-scale production verification , Trial use is risky .
边栏推荐
- 作为一名开发者,对你影响最深的书籍是哪一本?
- R语言构建回归模型诊断(正态性无效)、进行变量变换、使用car包中的powerTransform函数对目标变量进行Box-Cox变换(Box–Cox transform to normality)
- Istio Troubleshooting: using istio to reserve ports causes pod startup failure
- postgresql 之 ilist
- laravel8使用faker调用工厂填充数据
- leetcode.12 --- 整数转罗马数字
- IDEA 插件 Material Theme UI收费后的办法
- laravel 8 实现Auth登录
- 证券账户理财安全吗??
- 手机注册股票开户 炒股开户安全吗
猜你喜欢

The "little giant" specialized in special new products is restarted, and the "enterprise cloud" digital empowerment

CVPR 2022 - Interpretation of selected papers of meituan technical team

Port conflict handling method for tongweb

postgresql之List

ES mapping之keyword;term查詢添加keyword查詢;更改mapping keyword類型

List of PostgreSQL

Virtual machines on the same distributed port group but different hosts cannot communicate with each other

Mots clés pour la cartographie es; Ajouter une requête par mot - clé à la requête term; Changer le type de mot - clé de cartographie

tongweb使用之端口冲突处理办法

Go language concurrency model mpg model
随机推荐
Bitmap of redis data structure
IDEA 插件 Material Theme UI收费后的办法
update+catroot+c000021a+critical service failed+drivers+intelide+viaide+000000f
The industrial control security of roaming the Intranet
第八章 操作位和位串(四)
Daily knowledge popularization
Common sense knowledge points
Left hand code, right hand open source, part of the open source road
The security market has entered a trillion era, and the security B2B online mall system has been accurately connected to deepen the enterprise development path
Go language - use of goroutine coroutine
从pair到unordered_map,理论+leetcode题目实战
业务与技术双向结合构建银行数据安全管理体系
CVPR2022 | 可精簡域適應
Database considerations
探索云原生数据库,纵观未来科技发展
中国十大证券app排名 炒股开户安全吗
One article to get UDP and TCP high-frequency interview questions!
Actual combat | a tortuous fishing counteraction
Redis highly available
Qunhui synchronizes with alicloud OSS