当前位置:网站首页>Talk about TCP time_ WAIT
Talk about TCP time_ WAIT
2022-07-26 21:34:00 【Brother Xing plays with the clouds】
origin
Recently, I have colleagues using ab Carry out service pressure test , To QPS After the bottleneck, it is suspected that it is the problem of starting the compressor , Come and borrow me a test machine , So I took the opportunity to analyze the possibility that a wave of starting press may become the bottleneck of pressure measurement , Besides the Internet I/O、 Besides the performance of the machine , The problem of network protocol is also considered .
Of course, the protagonist of this article is not pressure testing , Later, the analysis proved that colleagues really wanted more , The bottleneck is on the server side .
In the process of analyzing the bottleneck of starting press , about TCP TIME_WAIT A conjecture of state intrigued me . Due to the previous troubleshooting , Simply touched this state , But I don't know much about , So I decided to take the time to analyze , Take my guess apart .
TCP State transition
We all know TCP Three handshakes of , Four waves , To put it simply , But in an unstable physical network , Every action can fail , In order to ensure that data is effectively transmitted ,TCP A lot of handling of these exceptions has been added to the specific implementation of .
State analysis
First, use a picture to recall TCP State transition .
At a glance , So many states , Lines in all directions , It makes people feel a little confused . But analyze it carefully , There is still a reason to follow .
First , The whole picture can be divided into three parts , That is to say, the process of building a company in the first half , The lower left part actively closes the connection process and the lower right part passively closes the connection process .
Let's look at the parts : The process of building a company is the three handshakes we are familiar with , It's just that there's another server in this picture LISTEN state ; And active close connection and passive close connection , It's all four waves .
Check the connection status
stay Linux On , We often use netstat To see the status of the network connection . Of course, we can also use more efficient ss (Socket Statistics) To replace netstat.
Both tools will list the socket The state of the connection , Through simple statistics, we can analyze this time The server Network state of .
TIME_WAIT
Definition
We can see from the picture above , When TCP When the connection is actively closed , Will pass by TIME_WAIT state . And we're on the machine curl One url Create a TCP After connection , Use ss And other tools can continuously observe the continuous in a certain period of time TIME_WAIT state .
therefore TIME_WAIT It's a state of being :TCP After four handshakes , Both sides of the connection no longer exchange messages , But the active shutdown party keeps the connection unavailable for a period of time .
that , What's the use of maintaining such a state ?
reason
As mentioned above , For complex network state ,TCP The realization of the proposed a variety of countermeasures ,TIME_WAIT State is put forward to deal with one of the abnormal conditions .
In order to understand TIME_WAIT The necessity of state , Let's start by assuming that there is no problem that such a state can cause . Temporarily A、B To refer to TCP Both ends of the connection ,A For the active close end .
- Four waves ,A Hair FIN, B Respond to ACK,B Reissue FIN,A Respond to ACK Close the connection . And if the A Responsive ACK The bag is missing ,B Would think A Didn't receive your own shutdown request , Then it will try again to A Reissue FIN package . without TIME_WAIT state ,A Don't save this connection again , Received a non-existent connection package ,A Will respond to RST package , Lead to B End exception response . here , TIME_WAIT To ensure full duplex TCP Normal termination of connection .
- We also know that ,TCP Under the IP Layer protocol can't guarantee the order of packet transmission . If both sides wave , A network quadruple (src/dst ip/port) Being recycled , At this time, there is a late packet in the network B receive ,A The application immediately uses the same quad to create a new connection , This late packet arrived B, Then this packet will let B Thought it was A Just sent . here , TIME_WAIT In order to ensure the normal expiration of lost packets in the network .
For two reasons ,TIME_WAIT The existence of state is very meaningful .
The determination of time
It's a matter of reason ,TIME_WAIT The duration of the state can be understood . determine TIME_WAIT The second case above is mainly considered , Ensure that all packets connected to the network expire after the connection is closed .
When it comes to expiration time , I have to bring up another concept : Maximum segment life (MSL, Maximum Segment Lifetime), It represents a TCP Segments can exist in the Internet system for the maximum time , from TCP The implementation of the , Pieces beyond this lifetime will be discarded .
TIME_WAIT The state is actively closed by A To keep , So let's think about A Come on , The maximum length of time that a packet may have received the last connection :A Just sent out the packet , Can keep MSL A long life , It's here B After end ,B End due to closed connection , Will respond to RST package , This RST The longest bag will be in MSL After a long time A, that A Just keep it up TIME_WAIT arrive 2MS It can guarantee that all the connected packets in the network will disappear .
MSL For a long time RFC Defined as 2 minute , But in different unix Implementation , The value is not certain , That we use a lot CentOS On , It is defined as 30s, We can go through /proc/sys/net/ipv4/tcp_fin_timeout This file view and modify this value .
ab Of ” strange ” performance
guess
From above , We know that because of TIME_WAIT The existence of , After each connection is actively closed , This connection has to be kept 2MSL(60s) Duration , A network quadruple will also be frozen 60s. The default port number that can be assigned to our machine is about 30000 individual ( It can be done by /proc/sys/net/ipv4/ip_local_port_range The file to view ).
So if we use curl Yes The server When asked , As a client , Use a port number of the machine , All port numbers are assigned to 60s Inside , Every second should be controlled in 500 QPS, More , The system can no longer assign port numbers .
But in use ab When the pressure test is carried out , Per second 4000 Of QPS Run for a few minutes , The starting press still works normally , Use ss When viewing connection details , Find a TIME_WAIT There is no connection of states .
analysis
At first I thought it was ab Using connection multiplexing and other technologies , Have a close look at ss Found that the local port number has been changing , What's going on ?
therefore , I started a simple service on a test machine , Port number 8090, Then start the pressure on another machine , And at the same time tcpdump Grab the bag .
Results found , first FIN All the bags are made of The server Sent , namely ab Will not actively close the connection .
On the The server Take a look , Sure enough , A large number of TIME_WAIT State connection .
But because the port that the server listens to will be reused , these TIME_WAIT The state of the connection does not have a significant impact on the server , It just takes up some system resources .
Summary
Of course , High concurrency , Too many TIME_WAIT It also puts a lot of pressure on the server , After all, maintain so much socket It also consumes resources , About how to solve TIME_WAIT Too many questions , You can see tcp Short connection TIME_WAIT The whole solution to the problem .
tcp Connection is the most basic concept in network programming , Based on different usage scenarios , We generally divide it into “ A long connection ” and “ Short connection ”, The advantages and disadvantages of long and short connection are not detailed here , The students who want to go directly to google Inquire about , This paper focuses on how to solve the problem of tcp Short connected TIME_WAIT problem .
The biggest advantage of short connection is convenience , Especially scripting languages , Because the process of script language is finished after execution , It's basically short connections . But the biggest disadvantage of short connection is that it will take up a lot of system resources , for example : Local port 、socket Handle . The reason for this problem is very simple :tcp There is no concept of long short connection in protocol layer , So whether it's a long connection or a short connection , Connection is established -> The data transfer -> The process and processing of connection closing are the same .
natural TCP After the client connection is closed , Will enter a TIME_WAIT The state of , The duration is usually 1~4 minute , For a scenario with a low number of connections ,1~4 Minutes are not long , It won't affect the system either , But if in a short time ( for example 1s Inside ) Make a lot of short connections , Then there may be such a situation : The operating system of the client socket Ports and handles are exhausted , The system can no longer initiate new connections !
for instance : Let's assume that every second we establish 1000 A short connection (Web It's very common in scenes , For example, every request goes to visit memcached), hypothesis TIME_WAIT The time is 1 minute , be 1 You need to build... In minutes 6W A short connection , because TIME_WAIT Time is 1 minute , These short connections 1 It's been... For minutes TIME_WAIT state , Will not release , and Linux The default local port range configuration is :net.ipv4.ip_local_port_range = 32768 61000 Less than 3W, Therefore, in this case, a new request cannot be established without a local port .
This problem can be solved in the following ways : 1) Can be changed to long connection , But it costs a lot , Too many long connections can cause server performance problems , and PHP Wait for script language , Need to pass through proxy Such software can achieve long connection ; 2) modify ipv4.ip_local_port_range, Increase the range of available ports , But it can only alleviate the problem , It can't solve the problem at all ; 3) Settings in client program socket Of SO_LINGER Options ; 4) The client machine opens tcp_tw_recycle and tcp_timestamps Options ; 5) The client machine opens tcp_tw_reuse and tcp_timestamps Options ; 6) Client machine settings tcp_max_tw_buckets For a very small value ;
Solving php Connect Memcached In the process of the short connection problem , We mainly verified 3)4)5)6) Several ways , The basic function verification and code verification are adopted , There is no performance stress test verification , Therefore, we need to pay attention to observe the business operation in the actual application , Packet loss found 、 Disconnection 、 Can't connect , We need to pay attention to whether these options lead to .
Although these methods can be used google Find the relevant information , But most of the information is general , And most of them are copycat , It's not of great reference value . In the process of positioning and dealing with these problems , Meet some doubts and difficulties , It also took some time to locate and solve , The following is a summary of relevant experience .
Only when we know more about the principle, can we find out the root cause faster , Network related knowledge will continue to consolidate .
边栏推荐
- 2022开放原子全球开源峰会议程速递 | 7 月 27 日分论坛议程一览
- 一种用于实体关系抽取的统一标签空间
- js点击图片打印图像
- Devsecops, speed and security
- ECCV 2022 | complete four tracking tasks at the same time! Unicorn: towards the unification of target tracking
- Mysql -count :count(1)、count(*)、count(列名)的区别
- 华为发布2025十大趋势:5G、机器人、AI等上榜
- Industrial basic IFC - extract model structure tree
- Arm Mali GPU的噩梦:三星、华为纷纷转向自研!
- Broadcast voice H5 speechsynthesisutterance
猜你喜欢

Industrial basic IFC - extract model structure tree

2022开放原子全球开源峰会议程速递 | 7 月 27 日分论坛议程一览

Selenium自动化测试面试题全家桶
![[hcie security] dual computer hot standby - primary and standby backup](/img/3b/392aabd9915a3f0997cf7a72240d63.png)
[hcie security] dual computer hot standby - primary and standby backup

The hardest lesson we learned from the crypto Market

除了「加机器」,其实你的微服务还能这样优化
![[download materials of harmoniyos topics] HDD Hangzhou station · offline salon focuses on application innovation to show the ecological charm of Hongmeng](/img/62/9e2ff0dc2c8b049fd32ad56334a0c0.jpg)
[download materials of harmoniyos topics] HDD Hangzhou station · offline salon focuses on application innovation to show the ecological charm of Hongmeng

ECCV 2022 | 同时完成四项跟踪任务!Unicorn: 迈向目标跟踪的大统一

功能尝鲜 | 解密 Doris 复杂数据类型 ARRAY

js中join方法
随机推荐
(C语言)文件的基本操作
1-《PyTorch深度学习实践》-线性模型
安全浏览器“隐身”模式可以查看历史记录吗?
< button> and < input type=button />
encodeURI VS encodeURIComponent
Basic use of livedatade
TypeScript中的类型断言
Summary of common interview questions of operating system, including answers
织梦文档关键词维护不管用
除了「加机器」,其实你的微服务还能这样优化
LeetCode 练习——剑指 Offer II 005. 单词长度的最大乘积
Serial port communication failure
滤波及失真
2022 open atom global open source summit agenda express | list of sub forum agenda on July 27
What are the characteristics of low code tools? The two development tracks of low code that can be seen by discerning people!
How to enter the specified user method body when debugging in idea?
Browser browser cache
【HCIA安全】双向NAT
MySQL -count: the difference between count (1), count (*), and count (column name)
TCP的粘包拆包问题解决方案