当前位置:网站首页>Principles of several common IO models
Principles of several common IO models
2022-07-03 15:40:00 【51CTO】
1、 Server side I/O cheng
I/O In a computer, it means Input/Output, IOPS (Input/Output Per Second) That is, the input and output per second ( Or reading and writing times ), Is one of the main indicators of disk performance .IOPS It refers to what the system can handle in unit time I/O Number of requests , It's usually processed per second I/O The requested quantity is in ,I/O Requests are usually read or write data operation requests .
Once complete I/O It is the complete exchange of message between process data in user space and kernel data in kernel space , But because kernel space is strictly isolated from user space , Therefore, in the process of data exchange, the process of user space cannot directly call the memory data of kernel space , It's about going through memory data from kernel space copy To the process memory in user space , So in a nutshell I/O Is to copy the data from the memory data in kernel space to the memory of the process in user space .
Linux Of I/O
- disk I/O
- The Internet I/O : Everything is a document , The essence is right socket Reading and writing of documents
1.1、 disk I/O
disk I/O It is a process that initiates a system call to the kernel , Request a resource on the disk, such as html File or picture , Then the kernel loads the target file into the kernel memory space through the corresponding driver , After loading, copy the data from kernel memory to process memory , If it is a relatively large data, it also needs to wait time .
1.2、 The Internet I/O
The Internet I/O Treatment process
Regardless of disk or network I/O
2、I/O Model
2.1、 System I/O Model
Sync / asynchronous : Focus on the message communication mechanism , That is, when the caller is waiting for the processing structure of something , Whether the callee provides notification of completion status .
- Sync :synchronous, The callee does not provide notification messages related to the processing results of the event , The caller needs to actively ask whether the processing is completed .
- asynchronous :asynchronous, The callee passes through the State 、 The notification or callback mechanism actively notifies the caller of the running state of the callee .
Blocking / Non blocking : Focus on the state of the caller before waiting for the result to return
- Blocking :blocking, finger IO Operation needs to be completed completely before returning to user space , Before the call result returns , The caller is suspended , I can't do anything else .
- Non blocking :nonblocking, finger IO A status value is returned to the user immediately after the operation is called , Without waiting for IO Operation complete , Before the final call result is returned , The caller will not be suspended , You can do something else .
System I/O The model combination :
2.2、 The Internet I/O Model
2.2.1、 Blocking type I/O Model (blocking IO)
Blocking IO The model is the simplest I/O Model , User thread in kernel IO Blocked during operation
User thread through system call read launch I/O Read operations , From user space to kernel space . The kernel waits until the packet arrives , Then copy the received data to user space , complete read operation
Users need to wait read Read the data to buffer after , To continue processing received data . Whole I/O During request , User threads are blocked , This causes the user to initiate IO When asked , Can't do anything , Yes CPU The utilization rate of resources is not enough
advantage : The procedure is simple , While blocking waiting for data, the process / Thread hanging , It doesn't take up much CPU resources
shortcoming : Each connection requires a separate process / Thread processing alone , When concurrent requests are large, in order to maintain the program , Memory 、 Thread switching costs a lot ,apache Of preforck This mode is used .
Synchronous blocking : The program sends to the kernel I/O Wait for the kernel to respond after the request , If the kernel handles the request IO Operation cannot return immediately , Then the process will wait And no longer accept new requests , And the process polls to see I/O Whether it is completed or not , When finished, the process will I/O The result is returned to Client, stay IO The process cannot accept requests from other clients without returning , And there's a process to check by itself I/O Whether it is completed or not , It's easy , But it's slower , Use less .
2.2.2、 Non blocking I/O Model (nonblocking IO)
User thread initiated IO Return immediately on request . But no data was read , User threads need to be initiated continuously IO request , Until the data arrives , To actually read the data , Carry on . namely “ polling ” There are two problems with the mechanism : If you have a large number of file descriptors, you have to wait , Then one by one read. This will bring a lot of Context Switch(read It's a system call , Every time you call it, you have to switch between user mode and core mode ). The polling time is not easy to grasp . Here is how long it takes to guess how long the data will arrive . The waiting time is set too long , The program response delay is too large ; Set too short , It will cause too many retries , Dry consumption CPU nothing more , It's a waste CPU The way , This model is rarely used directly , But in others IO Use non blocking in the model IO This feature .
Non blocking : The program sends to the kernel, please I/O Wait for the kernel to respond after the request , If the kernel handles the request IO Operation cannot return immediately IO result , Into the
Cheng will no longer wait , And continue to process other requests , But it still takes the process to look at the kernel at intervals I/O Whether it is completed or not .
It can be seen from the above figure , When the connection is set to non blocking , When the application process system calls recvfrom When no data is returned , The kernel will immediately return a EWOULDBLOCK error , Instead of blocking until the data is ready . As shown in the figure above, a datagram is ready for the fourth call , So the data will be copied to the application process buffer , therefore recvfrom Data returned successfully .
When an application process calls in such a loop recvfrom when , Call it polling polling. Doing so often costs a lot CPU Time , It's rarely used in practice
2.2.3、 Multiplexing I/O type (I/O multiplexing)
In the model above , Each file descriptor corresponds to IO Is monitored and processed by a thread .
Multiplexing IO It means that a thread can ( In fact, it is realized alternately , That is, concurrent completion ) Monitor and process multiple file descriptors corresponding to each other
Of IO, That is, reuse the same thread .
The reason why a thread can handle multiple at the same time IO, It is because this thread calls the... In the kernel SELECT,POLL or EPOLL Equal system adjustment
use , So as to realize multiplexing IO.
I/O multiplexing It mainly includes :select,poll,epoll Three system calls ,select/poll/epoll The good thing about it is that it's a single process You can handle multiple network connections at the same time IO.
Its basic principle is select/poll/epoll This function Will constantly poll all the socket, When a socket There's data coming in , Just inform the user of the process .
When the user process calls select, Then the whole process will be block, At the same time ,kernel Meeting “ monitor ” all select conscientious socket, When any one socket The data in is ready ,select It will return . At this time, the user process calls read operation , Take data from kernel Copy to user process .
Apache prefork Is the main process of this mode + Multi process / Single thread +select,worker Is the main process + Multi process / Multithreading +poll Pattern .
Advantages and disadvantages :
- advantage : Can be based on a blocking object , Waiting for ready on multiple descriptors at the same time , Instead of using multiple threads ( One thread per file descriptor ), This can greatly save system resources
- shortcoming : When the number of connections is small, the efficiency is better than multithreading + Blocking I/O The efficiency of the model is low , It could be more delayed , Because single connection processing requires 2 Secondary system call , The time taken up will increase
IO Multiplexing is used in the following situations :
- When the client processes multiple descriptors ( Generally interactive input and network socket interface ), You have to use I/O Reuse
- When a client processes multiple sockets at the same time , This is possible, but rarely
- When a server has to handle both listening sockets , Also deal with connected sockets , In general, it also needs to use I/O Reuse
- When a server is about to process TCP, And deal with UDP, Generally use I/O Reuse
- When a server has to process multiple services or protocols , Generally use I/O Reuse
2.2.4、 Signal driven I/O Model (signal-driven IO)
Signal driven I/O The process doesn't have to wait , You don't have to poll . Instead, let the kernel when the data is ready , Signal the process .
The call steps are , By system call sigaction , And register a callback function for signal processing , The call immediately returns , Then the main program can continue down , When there is I/O Operational readiness , That is, when the kernel data is ready , The kernel will generate a SIGIO The signal , And call back the registered signal callback function , In this way, the system can call... In the signal callback function recvfrom get data , Copy the data required by the user process from kernel space to user space .
The advantage of this model is that the process is not blocked while waiting for the datagram to arrive . The user's main program can continue to execute , Just wait for the notification from the signal processing function .
In signal driven mode I/O In the model , The application program uses socket interface to drive signal I/O, And install a signal processing function , The process continues to run without blocking
When the data is ready , The process will receive a SIGIO The signal , It can be called in the signal processing function I/O Operating functions process data .
advantage : The thread is not blocked while waiting for data , The kernel directly returns the call to receive the signal , It does not affect the process to continue to process other requests, so it can improve the utilization of resources
shortcoming : The signal I/O In large quantities IO During operation, it may be impossible to notify due to signal queue overflow
Asynchronous blocking : The program process sends... To the kernel IO After calling , Don't wait for the kernel to respond , You can continue to accept other requests , After the kernel receives the process request
On going IO If you can't return immediately , The kernel waits for the result , until IO When it is finished, the kernel notifies the process ,apache event The model is the main process + Multi process / Multithreading + Signal driven I/O Model .
2.2.5、 asynchronous I/O Model (asynchronous IO)
asynchronous I/O And signal driven I/O The biggest difference is , Signal driven is the kernel that tells the user when a process starts I/O operation , The asynchronous I/O The kernel notifies the user of the process I/O When is the operation completed , There is an essential difference between the two , It's equivalent to not having to eat in a restaurant , Just order a takeout , It also saves time waiting for dishes
Relative to synchronization I/O, asynchronous I/O Not in sequence . The user process goes on aio_read After the system call , Whether the kernel data is ready or not , Will be returned directly to the user process , Then the user mode process can do something else . wait until socket The data is ready , The kernel copies data directly to the process , It then sends a notification to the process from the kernel .IO Two phases , Processes are non blocking .
Signal driven IO When the kernel notifies the trigger handler , The signal handler also needs to block copying data from the kernel space buffer to the user space buffer , The asynchronous IO Directly after the second stage , The kernel directly informs the user that the thread can perform subsequent operations
advantage : asynchronous I/O Be able to make full use of DMA characteristic , Give Way I/O Operation and calculation overlap
shortcoming : To be truly asynchronous I/O, The operating system needs to do a lot of work . at present Windows Pass through IOCP True asynchrony I/O, stay Linux Under the system ,Linux 2.6 To introduce , at present AIO Is not perfect , So in Linux When implementing high concurrency network programming under IO Reuse model pattern + The architecture of multithreaded tasks can basically meet the requirements .
Linux Provides AIO Library functions implement asynchrony , But it's rarely used . There's a lot of open source asynchrony right now IO library , for example libevent、libev、libuv.
Asynchronous non-blocking : The program process sends... To the kernel IO After calling , Don't wait for the kernel to respond , You can continue to accept other requests , Called by the kernel IO If
Can't go back immediately , The kernel will continue to handle other things , until IO After completion, notify the kernel of the results , The kernel will IO The completed result is returned to the
cheng , During this period, the process can accept new requests , The kernel can also handle new things , So they don't affect each other , It can achieve larger and smaller at the same time
high IO Reuse , Therefore, asynchronous non blocking is the most used communication method .
2.2.6、 Five kinds IO contrast
These five networks I/O In the model , The later , The less congestion , Theoretically, the efficiency is also the best. The first four belong to synchronization I/O, Because the real I/O operation (revfrom) Will block the process / Threads . Only asynchrony I/O The model is related to POSIX Asynchronous defined I/O Match .
2.2.7、 Realization way
1、select:
select Ku is in linux and windows The platform basically supports Event driven model library , And the definition of the interface is basically the same , Only some parameters have slightly different meanings , Maximum concurrency limit 1024, Is the earliest event driven model .
2、poll: stay Linux The basic driving model of ,windows This driver model is not supported , yes select Upgraded version , Removed the maximum concurrency limit , Compiling nginx It can be used when –with-poll_module and –without-poll_module These two specify whether to compile select library .
3、epoll:
epoll Yes, library yes Nginx One of the highest performance event driven libraries supported by the server , Is recognized as a very good event driven model , It and select and poll There's a big difference ,epoll yes poll Upgraded version , But with the poll There's a big difference .
epoll The way to do this is to create a list of pending events , Then send this list to the kernel , When returning, check the table by polling , In order to judge whether the event happened ,epoll The maximum number of event descriptors that can be opened by a process is the maximum number of files that can be opened by the system , meanwhile epoll Library I/O The efficiency does not decrease linearly with the number of descriptors , Because it only reports to the kernel “ active ” The descriptor of the .
4、kqueue:
Used to support BSD A series of event driven models of University platform , Mainly used in FreeBSD 4.1 And above 、OpenBSD 2.0 Higher version ,NetBSD Version above grade and Mac OS X On the platform , The model is also poll A variant of Ku , So with epoll There is no essential difference , Both provide efficiency by avoiding polling operations .
5、Iocp:
Windows Implementation on the system , Corresponding to the first 5 Kind of ( asynchronous I/O) Model .
6、rtsig:
Not a common event driven , Maximum queue 1024, Not very often
7、/dev/poll:
Used to support unix Efficient event driven model of derivative platform , Mainly in the Solaris platform 、HP/UX, The model is sun The company is developing Solaris Series of platforms for the completion of the event driven mechanism , It uses virtual /dev/poll equipment , The developer adds the file descriptor to the device , And then through ioctl() Call to get event notification , Therefore, when running on the above series of platforms, please use /dev/poll Event driven mechanism .
8、eventport:
So is the programme sun The company is developing Solaris The event driven library proposed at the time , It's just Solaris 10 Version above , The driver library is designed to prevent the kernel from crashing .
2.2.8、 Summary of common models
select | poll | epoll | |
---|---|---|---|
Mode of operation | Traverse | Traverse | Callback |
Implement the bottom layer | Array | Linked list | Hashtable |
IO efficiency | Each call is linearly traversed , The time complexity is O(n) | Each call is linearly traversed , The time complexity is O(n) | Event notification method , whenever fd be ready , The callback function registered by the system will be called , Will be ready fd Put it in relllist in , Time complexity O(1) |
maximum connection | 1024(x86) or 2048(x64) | There is no upper limit | There is no upper limit |
fd Copy | Every time you call select All need to put fd The collection is copied from the user to the kernel state | Every time you call poll All need to put fd The collection is copied from the user to the kernel state | call epoll_ctl Copy it into the kernel and save it , After each time epoll_wait No copy |
summary :
1、epoll It's just a group of API, Compared with select This scans all file descriptors ,epoll Read only ready file descriptors , Then add
Event based ready notification mechanism , So the performance is quite good
2、 be based on epoll Event multiplexing reduces the number of inter process switches , It makes the operating system do less useless work relative to user tasks .
3、epoll Than select In terms of multiplexing , Reduce the workload of traversal loop and memory copy , Because active connections only account for the total concurrent connections
Take a small part .
边栏推荐
- Dataframe returns the whole row according to the value
- Popular understanding of linear regression (I)
- Second kill system 3 - list of items and item details
- 使用AUR下载并安装常用程序
- Custom annotation
- Large CSV split and merge
- socket. IO build distributed web push server
- Markdown file titles are all reduced by one level
- Leasing cases of the implementation of the new regulations on the rental of jointly owned houses in Beijing
- 《微服务设计》读书笔记(上)
猜你喜欢
Visual upper system design and development (Halcon WinForm) -3 Image control
Introduction, use and principle of synchronized
Detailed explanation of string function and string function with unlimited length
求字符串函数和长度不受限制的字符串函数的详解
Idea does not specify an output path for the module
C language brush questions ~leetcode and simple questions of niuke.com
秒杀系统3-商品列表和商品详情
使用AUR下载并安装常用程序
Automatic generation of client code from flask server code -- Introduction to flask native stubs Library
Halcon and WinForm study section 2
随机推荐
Jvm-03-runtime data area PC, stack, local method stack
分布式事务(Seata) 四大模式详解
需要知道的字符串函数
Digital image processing -- popular Canny edge detection
win32创建窗口及按钮(轻量级)
How to use annotations such as @notnull to verify and handle global exceptions
Halcon and WinForm study section 1
Detailed pointer advanced 1
[cloud native training camp] module 7 kubernetes control plane component: scheduler and controller
Download and install common programs using AUR
Visual upper system design and development (Halcon WinForm) -5 camera
Introduction, use and principle of synchronized
Can‘t connect to MySQL server on ‘localhost‘
Win32 create window and button (lightweight)
互斥对象与临界区的区别
自定义注解
[cloud native training camp] module VIII kubernetes life cycle management and service discovery
Calibre LVL
Second kill system 3 - list of items and item details
Tensorflow realizes verification code recognition (III)