Popular examples
It may not be appropriate to give an example here , But easy to understand examples .
such as , We usually send express , If things are too big , Then you need to break it into several parcels to mail .
When the recipient receives only individual packages , Things are incomplete , Corresponding to network transmission , This situation is called half bag .
Only when all packages are received , This thing ( Transmitted information ) To be complete , Therefore, the complete data cannot be parsed in the case of half package , Need to wait , When all packages are received .
So here comes the question , How to know that all the packages have been received ? We will analyze it later .
Another example , The Chinese New Year soon , I'm going to give some gifts to my relatives , Give each elder a watch , We all know that watches are small , And my family all live in the same village , So pack the gifts for the elders in a package and mail them , This can save freight .
This kind of data that should be transmitted in multiple packets is sent in one packet , Corresponding to network transmission , It's called sticky bag .
After reading this example , I should feel a little about sticky bags and half bags , Next, let's take a look at the actual situation in the network .
Physical truth
Sticky bag and half bag are only in TCP Only when transmitting , image UDP There will be no such situation , The reason is because TCP It's stream oriented , There is no boundary between data , and UDP There are boundaries .
If you are familiar with TCP and UDP Students of message format must know ,TCP The packet of has no message length , and UDP The packet of has message length , It also shows that TCP Why streaming .
So why do I say that the above example is not appropriate , Because in real life, there are boundaries between express packages ,TCP It's like running water , There are no clear boundaries .
then TCP There is the concept of sending buffer ,UDP In fact, there is no such concept .
hypothesis TCP The size of the data transmitted at one time exceeds the size of the send buffer , Then a complete message needs to be split into two or more tabloids , This may result in a half pack situation , When the receiving end receives incomplete data , It cannot be parsed successfully .
If TCP The size of data transmitted at one time is smaller than the sending buffer , Then it may be sent together with other messages , This is the sticky bag .
At this time, the receiving end cannot parse the message normally , It needs to be broken into multiple correct messages , To parse .
About sticky bag and half bag , I also saw someone take MTU ( Maximum transmission unit ) Gossip , If the data sent is greater than MTU Then there will be unpacking , Conditions leading to half package .
Personally, I think there is something wrong here , Simply understand ,UDP Also to follow MTU Yeah , Right ? Then why doesn't it happen half a package ?
Let's next look at how to solve sticky bags and half bags .
How to solve the problem of sticking package and half package ?
Sticky package : This idea is actually very clear , Just take it apart , It depends on how it is disassembled , For example, we can fix the length , We stipulate that every bag is 10 Bytes , then 10 Cut a byte , In this way, it can be disassembled and analyzed ok 了 .
Half pack : Half a package is actually incomplete information , We need to wait until we receive all the information , When we recognize that this is an incomplete package , We first hold live , Do not deal with , Wait for the data to complete before processing . The key point here is , How can we know that it is complete at this time ? The fixed length mentioned above is also a little , Of course, there are more and better solutions , Let's move on .
In fact, there are three common solutions to stick package and half package problems :
For the sake of illustration , The following is not described in binary bits and other units .
【 Article Welfare 】 In addition, Xiaobian also sorted out some C++ Back-end development interview questions , Teaching video , Back end learning roadmap for free , You can add what you need :
Click to join the learning exchange group ~
Group file sharing
Xiaobian strongly recommends C++ Back end development free learning address :
C/C++Linux Server development senior architect /C++ Background development architect
Fixed length
This is very simple , For example, now we need to transmit ABC、EF These two bags , If you don't deal with it, the receiving end is likely to receive AB、CEF perhaps ABCE、F wait .
At this time, we fix the length , We stipulate that the length of each message is 3, If the actual data of a message is insufficient 3, Then fill in with empty characters .
So the message we send is :
The situation received may be :
But we follow 3 Bit to handle , So I will only follow 3 Bit to parse , So for the first time, although the data received is ABCE, But let's analyze 3 position , That is to say ABC, Keep one E, When we continue to parse 3 When a , Found that the length is not enough 3, So we don't care about , First, wait. .
Later, I waited F“”, We find that the current data is satisfied 3 Yes. , So let's go on to analyze EF“” .
This solves the problem of sticking package and half package .
Corresponding to Netty The implementation in is FixedLengthFrameDecoder, This class implements fixed length decoding .
The core logic is what I said above , Let's look at the source code , It's simple :
Advantages of fixed length : Simple .
shortcoming : The fixed length is rigid , Not easy to expand , And if the setting is too large to meet the business scenario , It will lead to a waste of space , Because of insufficient length, it needs to be filled .
Separator
This should be well understood , Or take ABC、EF These two packages are examples , I'm finishing ABC after , Insert a semicolon , form ABC;,EF Empathy :
In this way, the unbounded TCP flow , To solve the problem of sticking package and half package , This should be well understood , Now that you TCP There is no boundary , I'll set a line for you in business .
Corresponding to Netty The implementation in is DelimiterBasedFrameDecoder, The specific source code is not posted , A little longer , But the truth is simple .
Keep parsing , Wait until the separator is recognized , Explain that the previous data is complete , So analyze the previous data , Then continue to scan and analyze later .
The advantages of delimiters : Simple , It won't waste space .
shortcoming : The content itself needs to be processed , Prevent delimiters in content , This will lead to confusion , So you need to scan the transmitted data once to escape , Or you can use base64 Encoding data , use 64 Characters other than characters can be used as separators .
Delimiters are also commonly used in the industry , such as Redis Just use newline characters to separate .
Fixed length field + Content
This is easy to understand , For example, the agreement stipulates that 4 Length of bit storage content , In this way, the content can be expanded :
Or take ABC、EF These two packages are examples :
The resolution process is : First get 4 position , If the data currently received is not enough 4 position , Then wait , enough 4 After bit parsing, the length is 3, So I'll take it back 3 position , If the same data is not enough 3 Just wait , If enough, analyze , This will get a complete package .
Then get back 4 position , parsed 2, In the same way 2 Take it later 2 position , parsed EF.
This way is to parse the fixed length field first , Get the length of the following content , Get content according to its length , So as to get a complete message .
Corresponding to Netty The implementation in is LengthFieldBasedFrameDecoder, The specific source code is not posted , A little longer ,
Fixed length field + The advantages of content : It can be accurately positioned according to fixed fields , There is no need to scan escape characters .
shortcoming : The design of fixed length fields is difficult , Big waste of space , After all, every message has this length , Small may not be enough .
summary
Okay , Let's summarize .
because TCP It's a stream oriented protocol , And use the buffer to improve the efficiency of transmission , So it will lead to sticking package / The occurrence of half package .
In this case , We can manipulate messages , Messages of fixed length can be agreed , Or embed the separator , Or use the fixed length field + Content and other common three ways to solve sticking package 、 Half a package of questions .
The above three are in Netty There are ready-made implementation classes in , Can be used directly :
It is suggested to experiment , There will be a clearer understanding .
Last
Okay , The content about sticky package and half package is written here , Don't analyze the source code , Thinking is more important .
Reference material
Recommend a zero sound education C/C++ Free open courses developed in the background , Personally, I think the teacher spoke well , Share with you :
C/C++ Background development senior architect , The content includes Linux,Nginx,ZeroMQ,MySQL,Redis,fastdfs,MongoDB,ZK, Streaming media ,CDN,P2P,K8S,Docker,TCP/IP, coroutines ,DPDK Etc , Learn now
original text :
Why the Internet I/O Will be blocked ?
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071607471224.html