当前位置：网站首页>Principles of several common IO models

Principles of several common IO models

2022-07-03 15:40:00 【51CTO】

1、 Server side I/O cheng

I/O In a computer, it means Input/Output, IOPS (Input/Output Per Second) That is, the input and output per second ( Or reading and writing times ), Is one of the main indicators of disk performance .IOPS It refers to what the system can handle in unit time I/O Number of requests , It's usually processed per second I/O The requested quantity is in ,I/O Requests are usually read or write data operation requests .

Once complete I/O It is the complete exchange of message between process data in user space and kernel data in kernel space , But because kernel space is strictly isolated from user space , Therefore, in the process of data exchange, the process of user space cannot directly call the memory data of kernel space , It's about going through memory data from kernel space copy To the process memory in user space , So in a nutshell I/O Is to copy the data from the memory data in kernel space to the memory of the process in user space .

Linux Of I/O

disk I/O
The Internet I/O : Everything is a document , The essence is right socket Reading and writing of documents

1.1、 disk I/O

disk I/O It is a process that initiates a system call to the kernel , Request a resource on the disk, such as html File or picture , Then the kernel loads the target file into the kernel memory space through the corresponding driver , After loading, copy the data from kernel memory to process memory , If it is a relatively large data, it also needs to wait time .

 Seek time of mechanical disk 、 Rotation delay and data transmission time ：
 Seek time ： It refers to the time it takes for the head to move to the correct track , The shorter the seek time is I/O The faster it will be processed , At present, the seek time of the disk is generally 3-15 Millisecond or so .

 Rotation delay ： It refers to the time spent rotating the disk to the sector where the data is located to the underside of the head , The rotation delay depends on the speed of the disk , It is usually expressed as half of the time required for the disk to rotate for one week , such as 7200 The average training delay of the rotating disk is about 60*1000/7200/2=4.17 millisecond , Formula means （ Every minute 60 second *1000 Milliseconds per second /7200 RPM /2）, If it is 15000 Turn to 60*1000/15000/2=2 millisecond .

 Data transfer time ： It refers to the time of data transmission after reading data , Mainly depends on the transmission rate , This value is equal to the data size divided by the transmission rate , At present, the transmission speed per second of the disk interface can reach 600MB, In this case, it can be ignored .

 Common average seek time value of mechanical disk ：
7200 turn / The average physical seek time of the disk in minutes ：9 millisecond 
10000 turn / The average physical seek time of the disk in minutes ：6 millisecond 
15000 turn / The average physical seek time of the disk in minutes ：4 millisecond 

 Average latency of common disks ：
7200 Average delay of rotating mechanical disk ：60*1000/7200/2 = 4.17ms
10000 Average delay of rotating mechanical disk ：60*1000/10000/2 = 3ms
15000 Average delay of rotating mechanical disk ：60*1000/15000/2 = 2ms

 Maximum per second IOPS The calculation method of ：
7200 Rotating disk IOPS Calculation method ：1000 millisecond /(9 Millisecond seek time +4.17 Average rotation delay time in milliseconds )=1000/13.13=75.9 IOPS
10000 Rotating disk IOPS Calculation method ：1000 millisecond /(6 Millisecond seek time +3 Average rotation delay time in milliseconds )=1000/9=111 IOPS
15000 Rotating disk IOPS Calculation method ：15000 millisecond /(4 Millisecond seek time +2 Average rotation delay time in milliseconds )=1000/6=166.6 IOPS

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.
      20.
      21.

1.2、 The Internet I/O

Several common IO The principle of the model _ Multiplexing
The Internet I/O Treatment process

 Get request data , The client establishes a connection with the server and sends a request , The server accepts the request （1-3）
 Build response , When the server receives the request , And handle client requests in user space , Until the build response is complete （4）
 Return the data , The server passes the constructed response through the network in kernel space  I/O  Send it back to the client （5-7）

     
      1.
      2.
      3.

Regardless of disk or network I/O

 Every time I/O, It has to go through two stages ：
 First step ： Load the data from the file into the kernel memory space first （ buffer ）, Wait for data preparation to complete , Longer time 
 The second step ： Copy data from the kernel buffer to the memory of the process in user space , Less time 

     
      1.
      2.
      3.

2、I/O Model

2.1、 System I/O Model

Sync / asynchronous ： Focus on the message communication mechanism , That is, when the caller is waiting for the processing structure of something , Whether the callee provides notification of completion status .

Sync ：synchronous, The callee does not provide notification messages related to the processing results of the event , The caller needs to actively ask whether the processing is completed .
asynchronous ：asynchronous, The callee passes through the State 、 The notification or callback mechanism actively notifies the caller of the running state of the callee .

Several common IO The principle of the model _IO Model _02

Blocking / Non blocking ： Focus on the state of the caller before waiting for the result to return

Blocking ：blocking, finger IO Operation needs to be completed completely before returning to user space , Before the call result returns , The caller is suspended , I can't do anything else .
Non blocking ：nonblocking, finger IO A status value is returned to the user immediately after the operation is called , Without waiting for IO Operation complete , Before the final call result is returned , The caller will not be suspended , You can do something else .

System I/O The model combination ：

 Here I have this 8 A cup of milk tea 
 Synchronous and asynchronous ：
   Will the boss tell me after I order milk tea ：
     Sync ： The boss will put the milk tea on the table after making it , But when you don't do it well, you need to check whether it is done well every time , The boss won't tell me when the milk tea is ready .
     asynchronous ： The boss will tell me where to put the milk tea after making it .

 Blocking and non blocking ：
   After I ordered milk tea ：
     Blocking ： I was waiting at the counter when the boss made milk tea , Don't do anything else .
     Non blocking ： After ordering milk tea, you can do other things , Go shopping for other things .

IO The model combination ：
   Synchronous blocking ： This requires me to stay there after ordering milk tea , You can't do anything else , And I don't know whether it's done well , I have to ask the boss again and again whether it is done .
   Synchronous nonblocking ： This is that you can do other things after ordering milk tea , But you can't do other things for a long time , Because I don't know if the milk tea is ready yet , This is also to ask the boss from time to time whether it is done . You can only take the time to do a little thing .

   Asynchronous blocking ： This is that I can't do anything else after ordering milk tea , Can only wait there , But there is no need to ask the boss again and again whether it is done , When it's done, the boss will tell me .
   Asynchronous non-blocking ： In this case, I can do other things after ordering milk tea , And you can always do other things , Because the boss will tell me when it's done .

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.

2.2、 The Internet I/O Model

2.2.1、 Blocking type I/O Model （blocking IO）

Several common IO The principle of the model _ Non blocking _03
Blocking IO The model is the simplest I/O Model , User thread in kernel IO Blocked during operation

User thread through system call read launch I/O Read operations , From user space to kernel space . The kernel waits until the packet arrives , Then copy the received data to user space , complete read operation
Users need to wait read Read the data to buffer after , To continue processing received data . Whole I/O During request , User threads are blocked , This causes the user to initiate IO When asked , Can't do anything , Yes CPU The utilization rate of resources is not enough

advantage ： The procedure is simple , While blocking waiting for data, the process / Thread hanging , It doesn't take up much CPU resources
shortcoming ： Each connection requires a separate process / Thread processing alone , When concurrent requests are large, in order to maintain the program , Memory 、 Thread switching costs a lot ,apache Of preforck This mode is used .

Synchronous blocking ： The program sends to the kernel I/O Wait for the kernel to respond after the request , If the kernel handles the request IO Operation cannot return immediately , Then the process will wait And no longer accept new requests , And the process polls to see I/O Whether it is completed or not , When finished, the process will I/O The result is returned to Client, stay IO The process cannot accept requests from other clients without returning , And there's a process to check by itself I/O Whether it is completed or not , It's easy , But it's slower , Use less .

2.2.2、 Non blocking I/O Model (nonblocking IO)

Several common IO The principle of the model _ data _04
User thread initiated IO Return immediately on request . But no data was read , User threads need to be initiated continuously IO request , Until the data arrives , To actually read the data , Carry on . namely “ polling ” There are two problems with the mechanism ： If you have a large number of file descriptors, you have to wait , Then one by one read. This will bring a lot of Context Switch（read It's a system call , Every time you call it, you have to switch between user mode and core mode ）. The polling time is not easy to grasp . Here is how long it takes to guess how long the data will arrive . The waiting time is set too long , The program response delay is too large ; Set too short , It will cause too many retries , Dry consumption CPU nothing more , It's a waste CPU The way , This model is rarely used directly , But in others IO Use non blocking in the model IO This feature .

Non blocking ： The program sends to the kernel, please I/O Wait for the kernel to respond after the request , If the kernel handles the request IO Operation cannot return immediately IO result , Into the
Cheng will no longer wait , And continue to process other requests , But it still takes the process to look at the kernel at intervals I/O Whether it is completed or not .

It can be seen from the above figure , When the connection is set to non blocking , When the application process system calls recvfrom When no data is returned , The kernel will immediately return a EWOULDBLOCK error , Instead of blocking until the data is ready . As shown in the figure above, a datagram is ready for the fourth call , So the data will be copied to the application process buffer , therefore recvfrom Data returned successfully .
When an application process calls in such a loop recvfrom when , Call it polling polling. Doing so often costs a lot CPU Time , It's rarely used in practice

2.2.3、 Multiplexing I/O type (I/O multiplexing)

In the model above , Each file descriptor corresponds to IO Is monitored and processed by a thread .
Multiplexing IO It means that a thread can （ In fact, it is realized alternately , That is, concurrent completion ） Monitor and process multiple file descriptors corresponding to each other
Of IO, That is, reuse the same thread .
The reason why a thread can handle multiple at the same time IO, It is because this thread calls the... In the kernel SELECT,POLL or EPOLL Equal system adjustment
use , So as to realize multiplexing IO.
Several common IO The principle of the model _ Multiplexing _05
I/O multiplexing It mainly includes :select,poll,epoll Three system calls ,select/poll/epoll The good thing about it is that it's a single process You can handle multiple network connections at the same time IO.
Its basic principle is select/poll/epoll This function Will constantly poll all the socket, When a socket There's data coming in , Just inform the user of the process .
When the user process calls select, Then the whole process will be block, At the same time ,kernel Meeting “ monitor ” all select conscientious socket, When any one socket The data in is ready ,select It will return . At this time, the user process calls read operation , Take data from kernel Copy to user process .

Apache prefork Is the main process of this mode + Multi process / Single thread +select,worker Is the main process + Multi process / Multithreading +poll Pattern .

IO Multiplexing （IO Multiplexing) ： It's a mechanism , The program registers a set of socket File descriptor to the operating system , Express “ I want to watch these fd Is there a IO events , When you have it, tell the program to handle ”
IO Multiplexing is generally the same as NIO Used together .NIO and IO Multiplexing is relatively independent .NIO It just means IO API Always return immediately , It won't be Blocking; and IO Multiplexing is just a convenient notification mechanism provided by the operating system . The operating system does not force the two to work together , It can be used only IO Multiplexing  + BIO, At this time, the current thread is still stuck .IO Multiplexing and NIO It makes sense to use together 
IO Multiplexing refers to one or more processes specified by the kernel once it is found IO Condition ready to read , Notify the process 
 Multiple connections share a waiting mechanism , This model will block the process , But the process is blocked in select perhaps poll On these two system calls , Instead of blocking in the real IO Operationally 
 Users will first need to IO Add operations to select in , Wait at the same time select System call return . When data arrives ,IO To be activated ,
select The function returns . User thread officially initiated read request , Read data and continue 
 From the perspective of process , Use select Function IO There is not much difference between request and synchronous blocking model , There's even more to add monitoring IO, And call select Extra operations on functions , Efficiency is even worse. . And blocked twice , But the first jam was select Upper time ,select You can monitor multiple IO Whether there is IO Operational readiness , It can handle multiple threads at the same time in the same thread IO Purpose of request . Not like blocking IO
 That kind of , Only one... Can be monitored at a time IO
 Although the above method allows multiple processing within a single thread IO request , But every one of them IO The request process is still blocked （ stay select Blocking on function ）, The average time is even longer than synchronous blocking IO The model is still long . If the user thread just registers what it needs IO request , And then do your own thing , Wait until the data arrives , Can be improved CPU Utilization ratio 
IO Multiplexing is the most commonly used IO Model , But it's not asynchronous enough “ thoroughly ”, Because it uses thread blocking select system call . therefore IO Multiplexing can only be called asynchronous blocking IO Model , Not really asynchronous IO

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.

Advantages and disadvantages :

advantage ： Can be based on a blocking object , Waiting for ready on multiple descriptors at the same time , Instead of using multiple threads ( One thread per file descriptor ), This can greatly save system resources
shortcoming ： When the number of connections is small, the efficiency is better than multithreading + Blocking I/O The efficiency of the model is low , It could be more delayed , Because single connection processing requires 2 Secondary system call , The time taken up will increase

IO Multiplexing is used in the following situations ：

When the client processes multiple descriptors （ Generally interactive input and network socket interface ）, You have to use I/O Reuse
When a client processes multiple sockets at the same time , This is possible, but rarely
When a server has to handle both listening sockets , Also deal with connected sockets , In general, it also needs to use I/O Reuse
When a server is about to process TCP, And deal with UDP, Generally use I/O Reuse
When a server has to process multiple services or protocols , Generally use I/O Reuse

2.2.4、 Signal driven I/O Model (signal-driven IO)

Several common IO The principle of the model _ Non blocking _06
Signal driven I/O The process doesn't have to wait , You don't have to poll . Instead, let the kernel when the data is ready , Signal the process .
The call steps are , By system call sigaction , And register a callback function for signal processing , The call immediately returns , Then the main program can continue down , When there is I/O Operational readiness , That is, when the kernel data is ready , The kernel will generate a SIGIO The signal , And call back the registered signal callback function , In this way, the system can call... In the signal callback function recvfrom get data , Copy the data required by the user process from kernel space to user space .
The advantage of this model is that the process is not blocked while waiting for the datagram to arrive . The user's main program can continue to execute , Just wait for the notification from the signal processing function .
In signal driven mode I/O In the model , The application program uses socket interface to drive signal I/O, And install a signal processing function , The process continues to run without blocking
When the data is ready , The process will receive a SIGIO The signal , It can be called in the signal processing function I/O Operating functions process data .

advantage ： The thread is not blocked while waiting for data , The kernel directly returns the call to receive the signal , It does not affect the process to continue to process other requests, so it can improve the utilization of resources

shortcoming ： The signal I/O In large quantities IO During operation, it may be impossible to notify due to signal queue overflow

Asynchronous blocking ： The program process sends... To the kernel IO After calling , Don't wait for the kernel to respond , You can continue to accept other requests , After the kernel receives the process request
On going IO If you can't return immediately , The kernel waits for the result , until IO When it is finished, the kernel notifies the process ,apache event The model is the main process + Multi process / Multithreading + Signal driven I/O Model .

2.2.5、 asynchronous I/O Model (asynchronous IO)

Several common IO The principle of the model _ Non blocking _07
asynchronous I/O And signal driven I/O The biggest difference is , Signal driven is the kernel that tells the user when a process starts I/O operation , The asynchronous I/O The kernel notifies the user of the process I/O When is the operation completed , There is an essential difference between the two , It's equivalent to not having to eat in a restaurant , Just order a takeout , It also saves time waiting for dishes
Relative to synchronization I/O, asynchronous I/O Not in sequence . The user process goes on aio_read After the system call , Whether the kernel data is ready or not , Will be returned directly to the user process , Then the user mode process can do something else . wait until socket The data is ready , The kernel copies data directly to the process , It then sends a notification to the process from the kernel .IO Two phases , Processes are non blocking .
Signal driven IO When the kernel notifies the trigger handler , The signal handler also needs to block copying data from the kernel space buffer to the user space buffer , The asynchronous IO Directly after the second stage , The kernel directly informs the user that the thread can perform subsequent operations

advantage ： asynchronous I/O Be able to make full use of DMA characteristic , Give Way I/O Operation and calculation overlap

shortcoming ： To be truly asynchronous I/O, The operating system needs to do a lot of work . at present Windows Pass through IOCP True asynchrony I/O, stay Linux Under the system ,Linux 2.6 To introduce , at present AIO Is not perfect , So in Linux When implementing high concurrency network programming under IO Reuse model pattern + The architecture of multithreaded tasks can basically meet the requirements .

Linux Provides AIO Library functions implement asynchrony , But it's rarely used . There's a lot of open source asynchrony right now IO library , for example libevent、libev、libuv.

Asynchronous non-blocking ： The program process sends... To the kernel IO After calling , Don't wait for the kernel to respond , You can continue to accept other requests , Called by the kernel IO If
Can't go back immediately , The kernel will continue to handle other things , until IO After completion, notify the kernel of the results , The kernel will IO The completed result is returned to the
cheng , During this period, the process can accept new requests , The kernel can also handle new things , So they don't affect each other , It can achieve larger and smaller at the same time
high IO Reuse , Therefore, asynchronous non blocking is the most used communication method .

2.2.6、 Five kinds IO contrast

These five networks I/O In the model , The later , The less congestion , Theoretically, the efficiency is also the best. The first four belong to synchronization I/O, Because the real I/O operation （revfrom） Will block the process / Threads . Only asynchrony I/O The model is related to POSIX Asynchronous defined I/O Match .
Several common IO The principle of the model _IO Model _08

2.2.7、 Realization way

1、select：
select Ku is in linux and windows The platform basically supports Event driven model library , And the definition of the interface is basically the same , Only some parameters have slightly different meanings , Maximum concurrency limit 1024, Is the earliest event driven model .
2、poll： stay Linux The basic driving model of ,windows This driver model is not supported , yes select Upgraded version , Removed the maximum concurrency limit , Compiling nginx It can be used when –with-poll_module and –without-poll_module These two specify whether to compile select library .
3、epoll：
epoll Yes, library yes Nginx One of the highest performance event driven libraries supported by the server , Is recognized as a very good event driven model , It and select and poll There's a big difference ,epoll yes poll Upgraded version , But with the poll There's a big difference .
epoll The way to do this is to create a list of pending events , Then send this list to the kernel , When returning, check the table by polling , In order to judge whether the event happened ,epoll The maximum number of event descriptors that can be opened by a process is the maximum number of files that can be opened by the system , meanwhile epoll Library I/O The efficiency does not decrease linearly with the number of descriptors , Because it only reports to the kernel “ active ” The descriptor of the .
4、kqueue：
Used to support BSD A series of event driven models of University platform , Mainly used in FreeBSD 4.1 And above 、OpenBSD 2.0 Higher version ,NetBSD Version above grade and Mac OS X On the platform , The model is also poll A variant of Ku , So with epoll There is no essential difference , Both provide efficiency by avoiding polling operations .
5、Iocp：
Windows Implementation on the system , Corresponding to the first 5 Kind of （ asynchronous I/O） Model .
6、rtsig：
Not a common event driven , Maximum queue 1024, Not very often
7、/dev/poll:
Used to support unix Efficient event driven model of derivative platform , Mainly in the Solaris platform 、HP/UX, The model is sun The company is developing Solaris Series of platforms for the completion of the event driven mechanism , It uses virtual /dev/poll equipment , The developer adds the file descriptor to the device , And then through ioctl() Call to get event notification , Therefore, when running on the above series of platforms, please use /dev/poll Event driven mechanism .
8、eventport：
So is the programme sun The company is developing Solaris The event driven library proposed at the time , It's just Solaris 10 Version above , The driver library is designed to prevent the kernel from crashing .

2.2.8、 Summary of common models

	select	poll	epoll
Mode of operation	Traverse	Traverse	Callback
Implement the bottom layer	Array	Linked list	Hashtable
IO efficiency	Each call is linearly traversed , The time complexity is O(n)	Each call is linearly traversed , The time complexity is O(n)	Event notification method , whenever fd be ready , The callback function registered by the system will be called , Will be ready fd Put it in relllist in , Time complexity O(1)
maximum connection	1024(x86) or 2048(x64)	There is no upper limit	There is no upper limit
fd Copy	Every time you call select All need to put fd The collection is copied from the user to the kernel state	Every time you call poll All need to put fd The collection is copied from the user to the kernel state	call epoll_ctl Copy it into the kernel and save it , After each time epoll_wait No copy

summary ：

1、epoll It's just a group of API, Compared with select This scans all file descriptors ,epoll Read only ready file descriptors , Then add
Event based ready notification mechanism , So the performance is quite good
2、 be based on epoll Event multiplexing reduces the number of inter process switches , It makes the operating system do less useless work relative to user tasks .
3、epoll Than select In terms of multiplexing , Reduce the workload of traversal loop and memory copy , Because active connections only account for the total concurrent connections
Take a small part .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/184/202207031533451166.html