当前位置:网站首页>Do you really understand sticky bag and half bag? 3 minutes to understand it

Do you really understand sticky bag and half bag? 3 minutes to understand it

2022-07-07 18:17:00 InfoQ

Popular examples

It may not be appropriate to give an example here , But easy to understand examples .
such as , We usually send express , If things are too big , Then you need to break it into several parcels to mail .
When the recipient receives only individual packages , Things are incomplete , Corresponding to network transmission , This situation is called half bag .
Only when all packages are received , This thing ( Transmitted information ) To be complete , Therefore, the complete data cannot be parsed in the case of half package , Need to wait , When all packages are received .
So here comes the question , How to know that all the packages have been received ? We will analyze it later .
Another example , The Chinese New Year soon , I'm going to give some gifts to my relatives , Give each elder a watch , We all know that watches are small , And my family all live in the same village , So pack the gifts for the elders in a package and mail them , This can save freight .
This kind of data that should be transmitted in multiple packets is sent in one packet , Corresponding to network transmission , It's called sticky bag .
After reading this example , I should feel a little about sticky bags and half bags , Next, let's take a look at the actual situation in the network .

Physical truth

Sticky bag and half bag are only in  TCP  Only when transmitting , image  UDP  There will be no such situation , The reason is because  TCP  It's stream oriented , There is no boundary between data , and  UDP  There are boundaries .
If you are familiar with  TCP  and  UDP  Students of message format must know ,TCP  The packet of has no message length , and  UDP  The packet of has message length , It also shows that  TCP  Why streaming .
So why do I say that the above example is not appropriate , Because in real life, there are boundaries between express packages ,TCP  It's like running water , There are no clear boundaries .
then  TCP  There is the concept of sending buffer ,UDP  In fact, there is no such concept .
hypothesis  TCP  The size of the data transmitted at one time exceeds the size of the send buffer , Then a complete message needs to be split into two or more tabloids , This may result in a half pack situation , When the receiving end receives incomplete data , It cannot be parsed successfully .
null
​ If  TCP  The size of data transmitted at one time is smaller than the sending buffer , Then it may be sent together with other messages , This is the sticky bag .
null
​ At this time, the receiving end cannot parse the message normally , It needs to be broken into multiple correct messages , To parse .
About sticky bag and half bag , I also saw someone take  MTU ( Maximum transmission unit ) Gossip , If the data sent is greater than  MTU  Then there will be unpacking , Conditions leading to half package .
Personally, I think there is something wrong here , Simply understand ,UDP  Also to follow  MTU  Yeah , Right ? Then why doesn't it happen half a package ?
Let's next look at how to solve sticky bags and half bags .

How to solve the problem of sticking package and half package ?

  • Sticky package : This idea is actually very clear , Just take it apart , It depends on how it is disassembled , For example, we can fix the length , We stipulate that every bag is 10 Bytes , then 10 Cut a byte , In this way, it can be disassembled and analyzed  ok  了 .
  • Half pack : Half a package is actually incomplete information , We need to wait until we receive all the information , When we recognize that this is an incomplete package , We first  hold  live , Do not deal with , Wait for the data to complete before processing . The key point here is , How can we know that it is complete at this time ? The fixed length mentioned above is also a little , Of course, there are more and better solutions , Let's move on .
In fact, there are three common solutions to stick package and half package problems :
  • Fixed length
  • Separator
  • Fixed length field + Content
For the sake of illustration , The following is not described in binary bits and other units .
【 Article Welfare 】 In addition, Xiaobian also sorted out some C++ Back-end development interview questions , Teaching video , Back end learning roadmap for free , You can add what you need :
Click to join the learning exchange group ~
  Group file sharing
Xiaobian strongly recommends C++ Back end development free learning address :
C/C++Linux Server development senior architect /C++ Background development architect ​
null

​ Fixed length

This is very simple , For example, now we need to transmit  ABC、EF  These two bags , If you don't deal with it, the receiving end is likely to receive  AB、CEF  perhaps  ABCE、F  wait .
At this time, we fix the length , We stipulate that the length of each message is  3, If the actual data of a message is insufficient  3, Then fill in with empty characters  .
So the message we send is  :
null
The situation received may be :
null
​ But we follow  3  Bit to handle , So I will only follow  3  Bit to parse , So for the first time, although the data received is  ABCE, But let's analyze  3  position , That is to say  ABC, Keep one  E, When we continue to parse  3  When a , Found that the length is not enough  3, So we don't care about , First, wait. .
Later, I waited  F“”, We find that the current data is satisfied  3  Yes. , So let's go on to analyze  EF“” .
This solves the problem of sticking package and half package .
Corresponding to  Netty  The implementation in is  FixedLengthFrameDecoder, This class implements fixed length decoding .
The core logic is what I said above , Let's look at the source code , It's simple :
null
​ Advantages of fixed length : Simple .
shortcoming : The fixed length is rigid , Not easy to expand , And if the setting is too large to meet the business scenario , It will lead to a waste of space , Because of insufficient length, it needs to be filled .

Separator

This should be well understood ,  Or take  ABC、EF  These two packages are examples , I'm finishing  ABC after , Insert a semicolon , form ABC;,EF  Empathy :
null
In this way, the unbounded  TCP  flow , To solve the problem of sticking package and half package , This should be well understood , Now that you  TCP  There is no boundary , I'll set a line for you in business .
Corresponding to  Netty  The implementation in is  DelimiterBasedFrameDecoder, The specific source code is not posted , A little longer , But the truth is simple .
Keep parsing , Wait until the separator is recognized , Explain that the previous data is complete , So analyze the previous data , Then continue to scan and analyze later .
The advantages of delimiters : Simple , It won't waste space .
shortcoming : The content itself needs to be processed , Prevent delimiters in content , This will lead to confusion , So you need to scan the transmitted data once to escape , Or you can use  base64  Encoding data , use  64  Characters other than characters can be used as separators .
Delimiters are also commonly used in the industry , such as  Redis  Just use newline characters to separate .

Fixed length field + Content

This is easy to understand , For example, the agreement stipulates that  4  Length of bit storage content , In this way, the content can be expanded :
null
​ Or take  ABC、EF  These two packages are examples :
null
​ The resolution process is : First get  4  position , If the data currently received is not enough  4  position , Then wait , enough  4  After bit parsing, the length is  3, So I'll take it back  3  position , If the same data is not enough  3  Just wait , If enough, analyze , This will get a complete package .
Then get back  4  position , parsed  2, In the same way  2  Take it later  2  position , parsed  EF.
This way is to parse the fixed length field first , Get the length of the following content , Get content according to its length , So as to get a complete message .
Corresponding to  Netty  The implementation in is  LengthFieldBasedFrameDecoder, The specific source code is not posted , A little longer ,
Fixed length field + The advantages of content : It can be accurately positioned according to fixed fields , There is no need to scan escape characters .
shortcoming : The design of fixed length fields is difficult , Big waste of space , After all, every message has this length , Small may not be enough .

summary

Okay , Let's summarize .
because  TCP  It's a stream oriented protocol , And use the buffer to improve the efficiency of transmission , So it will lead to sticking package / The occurrence of half package .
In this case , We can manipulate messages , Messages of fixed length can be agreed , Or embed the separator , Or use the fixed length field + Content and other common three ways to solve sticking package 、 Half a package of questions .
The above three are in  Netty  There are ready-made implementation classes in , Can be used directly :
  • FixedLengthFrameDecoder, Fixed length
  • DelimiterBasedFrameDecoder, Separator
  • LengthFieldBasedFrameDecoder, Fixed length field + Content
It is suggested to experiment , There will be a clearer understanding .

Last

Okay , The content about sticky package and half package is written here , Don't analyze the source code , Thinking is more important .

Reference material

null
Recommend a zero sound education C/C++ Free open courses developed in the background , Personally, I think the teacher spoke well , Share with you :
C/C++ Background development senior architect , The content includes Linux,Nginx,ZeroMQ,MySQL,Redis,fastdfs,MongoDB,ZK, Streaming media ,CDN,P2P,K8S,Docker,TCP/IP, coroutines ,DPDK Etc , Learn now

original text :
Why the Internet  I/O  Will be blocked ?
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071607471224.html