当前位置:网站首页>Thoroughly uncover how epoll realizes IO multiplexing

Thoroughly uncover how epoll realizes IO multiplexing

2022-07-26 16:59:00 InfoQ

The process is in  Linux  It's a guy with a lot of expenses , Not to mention creating , Context switching alone takes a few microseconds . So in order to efficiently provide services to a large number of users , One process must be able to handle many at the same time  tcp  Just connect . Now suppose a process remains  10000  link , So how to find out which connection has data readable 、 Which connection can be written  ?
Of course, we can use loop traversal to find  IO  event , But this way is too low . We want a more efficient mechanism , There are... On one of many connections  IO  Find it directly and quickly when it happens . In fact, this matter  Linux  The operating system has been prepared for us , It's what we know  
IO  Multiplexing
Mechanism . Reuse here refers to the reuse of processes .
stay  Linux  The upper multiplexing scheme has  select、poll、epoll. Of the three  epoll  The performance is the best , It can also support the largest amount of concurrency . So today we put  epoll  As the object to be disassembled , Discover how the kernel implements multiplexing  IO  Managed .
For the convenience of discussion , Let's give an example of using  epoll  Simple example ( Just an example , Not in practice ):
int main(){
 listen(lfd, ...);

 cfd1 = accept(...);
 cfd2 = accept(...);
 efd = epoll_create(...);

 epoll_ctl(efd, EPOLL_CTL_ADD, cfd1, ...);
 epoll_ctl(efd, EPOLL_CTL_ADD, cfd2, ...);
 epoll_wait(efd, ...)
}
Among them  epoll  The related functions are the following three :
  • epoll_create: Create a  epoll  object
  • epoll_ctl: towards  epoll  Object to manage
  • epoll_wait: Wait for... On the connection it manages  IO  event
With this help  demo, Let's start with  epoll  Deep disassembly of principle . I believe that after you understand this article , You are right about  epoll  Your ability to control will become perfect !!
Friendship tips , Ten thousand words long text , Cautious entry !!

One 、accept  Create a new  socket

We directly from the server side  accept  Speak up . When  accept  after , The process creates a new  socket  come out , Specifically for communication with the corresponding client , Then put it in the open file list of the current process .
null
​ One of them is connected  socket  A more specific structure of kernel objects is shown below .
null
Next, let's look at receiving connections  socket  Kernel object creation source code .accept  The system call code is located in the source file  net/socket.c  Next .
//file: net/socket.c
SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
 int __user *, upeer_addrlen, int, flags)
{
 struct socket *sock, *newsock;

 // according to  fd  Find the listening  socket
 sock = sockfd_lookup_light(fd, &err, &fput_needed);

 //1.1  Request and initialize a new  socket
 newsock = sock_alloc();
 newsock->type = sock->type;
 newsock->ops = sock->ops;

 //1.2  Apply for a new  file  object , And set it to the new  socket  On
 newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name);
 ......

 //1.3  Receiving connection
 err = sock->ops->accept(sock, newsock, sock->file->f_flags);

 //1.4  Add a new file to the open file list of the current process
 fd_install(newfd, newfile);

1.1  initialization  struct socket  object

In the above source code , The first is to call  sock_alloc  Apply for one  struct socket  The object comes out . Then go on to  listen  State of  socket  The collection of protocol operation functions on the object  ops  Assign to new  socket.( For all  AF_INET  Under the protocol family  socket  Come on , Their  ops  The method is the same , So it can be copied directly here )
null
​ among  inet_stream_ops  Is defined as follows
//file: net/ipv4/af_inet.c
const struct proto_ops inet_stream_ops = {
 ...
 .accept = inet_accept,
 .listen = inet_listen,
 .sendmsg = inet_sendmsg,
 .recvmsg = inet_recvmsg,
 ...
}

1.2  As new  socket  Object application  file

struct socket  Object has an important member  -- file  Kernel object pointer . This pointer is null when initialized . stay  accept  Method  sock_alloc_file  To request memory and initialize . And then new  file  Object set to  sock->file  On .
null
Look at  sock_alloc_file  Implementation process :
struct file *sock_alloc_file(struct socket *sock, int flags,
 const char *dname)
{
 struct file *file;
 file = alloc_file(&path, FMODE_READ | FMODE_WRITE,
 &socket_file_ops);
 ......
 sock->file = file;
}
sock_alloc_file  It will then call  alloc_file. Pay attention to  alloc_file  In the method , hold  socket_file_ops  The function set is assigned to the new  file->f_op  In the .
//file: fs/file_table.c
struct file *alloc_file(struct path *path, fmode_t mode,
 const struct file_operations *fop)
{
 struct file *file;
 file->f_op = fop;
 ......
}
socket_file_ops  The specific definition is as follows :
//file: net/socket.c
static const struct file_operations socket_file_ops = {
 ...
 .aio_read = sock_aio_read,
 .aio_write = sock_aio_write,
 .poll = sock_poll,
 .release = sock_close,
 ...
};
See here , stay  accept  New created in  socket  Inside  file->f_op->poll  The function points to  sock_poll. Next we'll call it , Let's talk about it later .
Actually  file  There is also a... Inside the object  socket  The pointer , Point to  socket  object .
【 Article Welfare 】 In addition, Xiaobian also sorted out some C++ Back-end development interview questions , Teaching video , Back end learning roadmap for free , You can add what you need :
Q Group :720209036  Click to add ~
  Group file sharing
Xiaobian strongly recommends C++ Back end development free learning address :
C/C++Linux Server development senior architect /C++ Background development architect ​
null

​1.3  Receiving connection

stay  socket  In kernel objects except  file  Outside the object pointer , There is a core member  sock.
//file: include/linux/net.h
struct socket {
 struct file *file;
 struct sock *sk;
}
This  struct sock  The data structure is very large , yes  socket  Core kernel object . Sending queue 、 Receiving queue 、 Core data structures such as waiting queues are located here . Its definition location file  include/net/sock.h, Because it's too long to show .
stay  accept  In the source code :
//file: net/socket.c
SYSCALL_DEFINE4(accept4, ...)
 ...
 //1.3  Receiving connection
 err = sock->ops->accept(sock, newsock, sock->file->f_flags);
}
sock->ops->accept  The corresponding method is  inet_accept. When it is executed, it will directly obtain the created... From the handshake queue  sock.sock  The complete creation process of the object involves three handshakes , More complicated , Don't expand, say . Let's just watch  struct sock  A function used in the initialization process :
void sock_init_data(struct socket *sock, struct sock *sk)
{
 sk->sk_wq = NULL;
 sk->sk_data_ready = sock_def_readable;
}
Put... Here  sock  Object's  sk_data_ready  Function pointer set to  sock_def_readable
. Just remember this here , We'll use that later .

1.4  Add a new file to the list of open files in the current process

When  file、socket、sock  When the key kernel objects are created , The only thing left to do is to hang it in the open file list of the current process .
//file: fs/file.c
void fd_install(unsigned int fd, struct file *file)
{
 __fd_install(current->files, fd, file);
}

void __fd_install(struct files_struct *files, unsigned int fd,
 struct file *file)
{
 ...
 fdt = files_fdtable(files);
 BUG_ON(fdt->fd[fd] != NULL);
 rcu_assign_pointer(fdt->fd[fd], file);
}

Two 、epoll_create  Realization

The user invokes the process  epoll_create  when , The kernel will create a  struct eventpoll  The kernel object of . And also associate it with the open file list of the current process .
null
​ about  struct eventpoll  object , The more detailed structure is as follows ( Also, list only members related to today's topic ).
null
​epoll_create  The source code is relatively simple . stay  fs/eventpoll.c  Next
// file:fs/eventpoll.c
SYSCALL_DEFINE1(epoll_create1, int, flags)
{
 struct eventpoll *ep = NULL;

 // Create a  eventpoll  object
 error = ep_alloc(&ep);
}
struct eventpoll  The definition of is also in this source file .
// file:fs/eventpoll.c
struct eventpoll {

 //sys_epoll_wait Waiting queue used
 wait_queue_head_t wq;

 // The received descriptors will be put here
 struct list_head rdllist;

 // Every epoll There is a red black tree in the object
 struct rb_root rbr;

 ......
}
eventpoll  The meanings of several members in this structure are as follows :
  • wq:
      Waiting queue linked list . When the soft interrupt data is ready, it will pass  wq  To find the blockage  epoll  Object .
  • rbr:
      A red and black tree . In order to support efficient search of massive connections 、 Insert and delete ,eventpoll  A red black tree is used inside . This tree is used to manage all the items added under the user process  socket  Connect .
  • rdllist:
      Linked list of ready descriptors . When some connections are ready , The kernel will put the ready connection into  rdllist  In the list . In this way, the application process only needs to judge the linked list to find the ready process , Instead of traversing the whole tree .
Of course, after this structure is applied for , A little bit of initialization is needed , It's all here  ep_alloc  Finish in .
//file: fs/eventpoll.c
static int ep_alloc(struct eventpoll **pep)
{
 struct eventpoll *ep;

 // apply  epollevent  Memory
 ep = kzalloc(sizeof(*ep), GFP_KERNEL);

 // Initialize the waiting queue header
 init_waitqueue_head(&ep->wq);

 // Initialize the ready list
 INIT_LIST_HEAD(&ep->rdllist);

 // Initialize the red black tree pointer
 ep->rbr = RB_ROOT;

 ......
}
Speaking of this , These members are actually just defined or initialized , It hasn't been used yet . They will be used below .

3、 ... and 、epoll_ctl  add to  socket

Understanding this step is to understand the whole  epoll  The key to
.
For simplicity , We only consider using  EPOLL_CTL_ADD  add to  socket, Ignore delete and update first .
Suppose we now have multiple connections with clients  socket  It's all created , We have created  epoll  Kernel object . In the use of  epoll_ctl  Register every  socket  When , The kernel will do the following three things
  • 1. Assign a red black tree node object  epitem,
  • 2. Add wait events to  socket  Waiting in the queue , Its callback function is  ep_poll_callback
  • 3. take  epitem  Insert into  epoll  In the red and black tree of the object
adopt  epoll_ctl  Add two  socket  in the future , The relationship diagram of these kernel data structures in the process is roughly as follows :
null
Let's take a closer look  socket  How to add to  epoll  In the object , find  epoll_ctl  Source code .
// file:fs/eventpoll.c
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 struct epoll_event __user *, event)
{
 struct eventpoll *ep;
 struct file *file, *tfile;

 // according to  epfd  find  eventpoll  Kernel object
 file = fget(epfd);
 ep = file->private_data;

 // according to  socket  Handle number ,  To find the  file  Kernel object
 tfile = fget(fd);

 switch (op) {
 case EPOLL_CTL_ADD:
 if (!epi) {
 epds.events |= POLLERR | POLLHUP;
 error = ep_insert(ep, &epds, tfile, fd);
 } else
 error = -EEXIST;
 clear_tfile_check_list();
 break;
}
stay  epoll_ctl  First, according to the incoming  fd  find  eventpoll、socket  Related kernel objects  . about  EPOLL_CTL_ADD  In terms of operation , Will then execute to  ep_insert  function . All registration is done in this function .
//file: fs/eventpoll.c
static int ep_insert(struct eventpoll *ep,
 struct epoll_event *event,
 struct file *tfile, int fd)
{
 //3.1  Assign and initialize  epitem
 // Allocate one epi object
 struct epitem *epi;
 if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
 return -ENOMEM;

 // To assign epi To initialize
 //epi->ffd The handle number and struct file Object address
 INIT_LIST_HEAD(&epi->pwqlist);
 epi->ep = ep;
 ep_set_ffd(&epi->ffd, tfile, fd);

 //3.2  Set up  socket  Waiting in line
 // Define and initialize  ep_pqueue  object
 struct ep_pqueue epq;
 epq.epi = epi;
 init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);

 // call  ep_ptable_queue_proc  Register callback function
 // The actual injected function is  ep_poll_callback
 revents = ep_item_poll(epi, &epq.pt);

 ......
 //3.3  take epi Insert into  eventpoll  In the red black tree in the object
 ep_rbtree_insert(ep, epi);
 ......
}

3.1  Assign and initialize  epitem

For each of these  socket, call  epoll_ctl  When , Will be assigned a  epitem. The main data of this structure are as follows :
//file: fs/eventpoll.c
struct epitem {

 // Red black tree node
 struct rb_node rbn;

 //socket File descriptor information
 struct epoll_filefd ffd;

 // Belong to  eventpoll  object
 struct eventpoll *ep;

 // Waiting in line
 struct list_head pwqlist;
}
Yes  epitem  Did some initialization , First, in the  epi->ep = ep  In this line of code  ep  Pointer to  eventpoll  object . In addition, use the... To be added  socket  Of  file、fd  Fill in  epitem->ffd.
null
Which uses  ep_set_ffd  Function as follows .

static inline void ep_set_ffd(struct epoll_filefd *ffd,
 struct file *file, int fd)
{
 ffd->file = file;
 ffd->fd = fd;
}

3.2  Set up  socket  Waiting in line

Creating  epitem  And after initialization ,ep_insert  The second thing in is to set  socket  Waiting task queue on object . And put the function  fs/eventpoll.c  Under the document  ep_poll_callback  Set as the callback function when the data is ready .
null
​ The source code of this piece is a little bit around , If you don't have patience, just jump to the bold font below . First of all to see  ep_item_poll.
static inline unsigned int ep_item_poll(struct epitem *epi, poll_table *pt)
{
 pt->_key = epi->event.events;

 return epi->ffd.file->f_op->poll(epi->ffd.file, pt) & epi->event.events;
}
see , Here we call  socket  Under the  file->f_op->poll. Through the first section above  socket  Structure diagram , We know that this function is actually  sock_poll.
/* No kernel lock held - perfect */
static unsigned int sock_poll(struct file *file, poll_table *wait)
{
 ...
 return sock->ops->poll(file, sock, wait);
}
Also look back at the first quarter  socket  Structure diagram ,sock->ops->poll  Actually, it points to  tcp_poll.
//file: net/ipv4/tcp.c
unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
 struct sock *sk = sock->sk;

 sock_poll_wait(file, sk_sleep(sk), wait);
}
stay  sock_poll_wait  Before the second parameter of , First called.  sk_sleep  function .
In this function, it gets  sock  Waiting queue column header under object  wait_queue_head_t, Wait for the queue item to be inserted here
. A little attention here , yes  socket  Waiting queue , No  epoll  Object's . Look at  sk_sleep  Source code :
//file: include/net/sock.h
static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
 BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
 return &rcu_dereference_raw(sk->sk_wq)->wait;
}
Then really enter  sock_poll_wait.
static inline void sock_poll_wait(struct file *filp,
 wait_queue_head_t *wait_address, poll_table *p)
{
 poll_wait(filp, wait_address, p);
}

static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
{
 if (p && p->_qproc && wait_address)
 p->_qproc(filp, wait_address, p);
}
there  qproc  It's a function pointer , It's in front of  init_poll_funcptr  When called, it is set to  ep_ptable_queue_proc  function .
static int ep_insert(...)
{
 ...
 init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);
 ...
}

//file: include/linux/poll.h
static inline void init_poll_funcptr(poll_table *pt,
 poll_queue_proc qproc)
{
 pt->_qproc = qproc;
 pt->_key = ~0UL; /* all events enabled */
}
On the blackboard !!! Be careful , Wasted a long time , Finally got to the point ! stay  ep_ptable_queue_proc  Function , Created a new waiting queue item , And register its callback function as  ep_poll_callback  function . Then add the waiting item to  socket  Waiting in the queue
.
//file: fs/eventpoll.c
static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 poll_table *pt)
{
 struct eppoll_entry *pwq;
 f (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL))) {
 // init callback
 init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);

 // take ep_poll_callback Put in socket Waiting queue whead( Notice that it's not epoll Waiting queue )
 add_wait_queue(whead, &pwq->wait);

 }
In the foreword  
In depth understanding of the stumbling block on the road of high-performance network development  -  Synchronous blocking network  IO
  Blocking system calls in  recvfrom  in , Due to the need to wake up the user process when the data is ready , So wait for the of the object item  private ( The name of this variable is also drunk )  Will be set to the current user process descriptor  current. And today we  socket  It's for  epoll  To manage , It doesn't need to be in a  socket  Wake up the process when it's ready , So here  q->private  Without any eggs, it is set to  NULL.
//file:include/linux/wait.h
static inline void init_waitqueue_func_entry(
 wait_queue_t *q, wait_queue_func_t func)
{
 q->flags = 0;
 q->private = NULL;

 //ep_poll_callback  Sign up to  wait_queue_t On the object
 // Call when data arrives  q->func
 q->func = func;
}
Above , Only the callback function is set in the waiting queue item  q->func  by  ep_poll_callback. At the back  5  In the coming data section, we will see , The soft interrupt receives the data  socket  After receiving the queue , Will register through this  ep_poll_callback  Function to call back , Then inform  epoll  object .

3.3  Insert the red black tree

Distribution of the  epitem  After the object , Then insert it into the red and black tree . One inserted some  socket  Descriptors  epoll  The schematic diagram of the red and black trees in the is as follows :
null
Here we'll talk about why we use red and black trees , Many people say it's because of high efficiency . In fact, I don't think this explanation is comprehensive enough , To say, the search efficiency tree can't compare with  HASHTABLE. Personally, I think a more reasonable explanation is to make  epoll  Looking for efficiency 、 Insertion efficiency 、 Memory overhead and many other aspects are relatively balanced , Finally, it is found that the data structure of black tree is the most suitable for this demand .

Four 、epoll_wait  Waiting to receive

epoll_wait  What you do is not complicated , When it is called, it observes  eventpoll->rdllist  There is no data in the list . If there is data, return it , Create a waiting queue item without data , Add it to  eventpoll  On the waiting queue , Then block yourself off and you're done .
null
Be careful :epoll_ctl  add to  socket  Waiting queue items are also created . The difference is that the waiting queue item here is hung in  epoll  On the object , The former is hung on  socket  On the object .
Its source code is as follows :
//file: fs/eventpoll.c
SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
 int, maxevents, int, timeout)
{
 ...
 error = ep_poll(ep, events, maxevents, timeout);
}

static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 int maxevents, long timeout)
{
 wait_queue_t wait;
 ......

fetch_events:
 //4.1  Determine whether there are events on the ready queue
 if (!ep_events_available(ep)) {

 //4.2  Define a wait event and associate it with the current process
 init_waitqueue_entry(&wait, current);

 //4.3  Put new  waitqueue  Add to  epoll->wq  In the list
 __add_wait_queue_exclusive(&ep->wq, &wait);

 for (;;) {
 ...
 //4.4  Give up CPU  Take the initiative to go to sleep
 if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
 timed_out = 1;
 ...
}

4.1  Determine whether there are events on the ready queue

First call  ep_events_available  To determine whether there are processable events in the ready linked list .
//file: fs/eventpoll.c
static inline int ep_events_available(struct eventpoll *ep)
{
 return !list_empty(&ep->rdllist) || ep->ovflist != EP_UNACTIVE_PTR;
}

4.2  Define a wait event and associate it with the current process

Suppose there is no ready connection , Then it will go into  init_waitqueue_entry  Waiting tasks defined in , And put  current ( The current process ) Add to  waitqueue  On .
Yes , When there is no  IO  At the time of the event , epoll  It will also block the current process . This is reasonable , Because there's nothing to do, occupying  CPU  It doesn't make any sense . Many articles on the Internet have a very bad habit , Discuss blocking 、 We don't say the subject when we use the concept of non blocking . This will cause you to see in the clouds . take  epoll  Come on ,epoll  Itself is blocked , But usually  socket  Set to non blocking . Only when the subject is said , These concepts make sense .
//file: include/linux/wait.h
static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p)
{
 q->flags = 0;
 q->private = p;
 q->func = default_wake_function;
}
Note that the callback function name here is  default_wake_function. Follow up on the  5  This function will be called when the section data comes .

4.3  Add to waiting queue

static inline void __add_wait_queue_exclusive(wait_queue_head_t *q,
 wait_queue_t *wait)
{
 wait->flags |= WQ_FLAG_EXCLUSIVE;
 __add_wait_queue(q, wait);
}
ad locum , Add the wait event defined in the previous section to  epoll  Object in the wait queue for .

4.4  Give up  CPU  Take the initiative to go to sleep

adopt  set_current_state  Set the current process to interruptible . call  schedule_hrtimeout_range  Give up  CPU, Take the initiative to go to sleep
//file: kernel/hrtimer.c
int __sched schedule_hrtimeout_range(ktime_t *expires,
 unsigned long delta, const enum hrtimer_mode mode)
{
 return schedule_hrtimeout_range_clock(
 expires, delta, mode, CLOCK_MONOTONIC);
}

int __sched schedule_hrtimeout_range_clock(...)
{
 schedule();
 ...
}
stay  schedule  Select the next process schedule
//file: kernel/sched/core.c
static void __sched __schedule(void)
{
 next = pick_next_task(rq);
 ...
 context_switch(rq, prev, next);
}

5、 ... and 、 Here comes the data

in front  epoll_ctl  When it comes to execution , For each kernel  socket  A waiting queue item has been added to the . stay  epoll_wait  At the end of the run , And in the  event poll  Added wait queue element to object . Before discussing data reception , Let's summarize the contents of these queue items a little more .
null
  • socket->sock->sk_data_ready  The set ready processing function is  sock_def_readable
  • stay  socket  In the waiting queue item of , Its callback function is  ep_poll_callback. In addition, the  private  It's no use , It points to a null pointer  null.
  • stay  eventpoll  In the waiting queue item of , The callback function is  default_wake_function. Its  private  Points to the user process waiting for the event .
In this section , We will see how soft interrupts enter each callback function in turn after data processing , Finally, notify the user of the process .

5.1  Receive data to task queue

About how soft interrupts handle network frames , To avoid overstaffing , I won't introduce . If you are interested, you can read the article  
《 The illustration  Linux  Network packet receiving process 》
. We're going directly from today  tcp  The processing entry function of the protocol stack  tcp_v4_rcv  Start talking about .
// file: net/ipv4/tcp_ipv4.c
int tcp_v4_rcv(struct sk_buff *skb)
{
 ......
 th = tcp_hdr(skb); // obtain tcp header
 iph = ip_hdr(skb); // obtain ip header

 // According to the packet  header  Medium  ip、 Port information to find the corresponding socket
 sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
 ......

 //socket  Not locked by user
 if (!sock_owned_by_user(sk)) {
 {
 if (!tcp_prequeue(sk, skb))
 ret = tcp_v4_do_rcv(sk, skb);
 }
 }
}
stay  tcp_v4_rcv  First of all, according to the received network packet  header  Inside  source  and  dest  Information to query the corresponding  socket. When you find it , We go directly to the receiving body function  tcp_v4_do_rcv  Look at .
//file: net/ipv4/tcp_ipv4.c
int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
{
 if (sk->sk_state == TCP_ESTABLISHED) {

 // Perform data processing in the connected state
 if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) {
 rsk = sk;
 goto reset;
 }
 return 0;
 }

 // Others are not  ESTABLISH  Packet processing of state
 ......
}
Let's assume we're dealing with  ESTABLISH  The package in state , So it's back into  tcp_rcv_established  Function .
//file: net/ipv4/tcp_input.c
int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 const struct tcphdr *th, unsigned int len)
{
 ......

 // Receiving data into the queue
 eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
 &fragstolen);

 // data  ready, Wake up the  socket  Blocked processes on the Internet
 sk->sk_data_ready(sk, 0);
stay  tcp_rcv_established  By calling  tcp_queue_rcv  Function to put the received data into  socket  On the receiving queue of .
null
The source code is as follows :
//file: net/ipv4/tcp_input.c
static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,
 bool *fragstolen)
{
 // Put the received data in  socket  At the end of the receive queue
 if (!eaten) {
 __skb_queue_tail(&sk->sk_receive_queue, skb);
 skb_set_owner_r(skb, sk);
 }
 return eaten;
}

5.2  Find ready callback function

call  tcp_queue_rcv  After receiving , Then call  sk_data_ready  To wake up in  socket  User processes waiting on . Here's another function pointer . Think back to the first section above, we were  accept  Function creation  socket  Mentioned in the process  sock_init_data  function , In this function, we've put  sk_data_ready  Set to  sock_def_readable  Function . It's the default data ready processing function .
When  socket  When the upper data is ready , The kernel will start with  sock_def_readable  The entry function is , find  epoll_ctl  add to  socket  Callback function set on when  ep_poll_callback.
null
Let's take a closer look at the details :
//file: net/core/sock.c
static void sock_def_readable(struct sock *sk, int len)
{
 struct socket_wq *wq;

 rcu_read_lock();
 wq = rcu_dereference(sk->sk_wq);

 // It's not a good name , It's not a blocked process ,
 // Instead, judge that the waiting queue is not empty
 if (wq_has_sleeper(wq))
 // Execute the callback function on the waiting queue item
 wake_up_interruptible_sync_poll(&wq->wait, POLLIN | POLLPRI |
 POLLRDNORM | POLLRDBAND);
 sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
 rcu_read_unlock();
}
The function names here are actually confusing .
  • wq_has_sleeper, For simple  recvfrom  System call , It is really to judge whether there is a process blocking . But for  epoll  Under the  socket  Just judge that the waiting queue is not empty , There may not be process blocking .
  • wake_up_interruptible_sync_poll, It just goes into  socket  Wait for the callback function set on the queue item , There is not necessarily an operation to wake up the process .
Then we'll focus on  wake_up_interruptible_sync_poll .
Let's take a look at how the kernel finds the callback function registered in the waiting queue item .
//file: include/linux/wait.h
#define wake_up_interruptible_sync_poll(x, m) \
 __wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m))


//file: kernel/sched/core.c
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode,
 int nr_exclusive, void *key)
{
 ...
 __wake_up_common(q, mode, nr_exclusive, wake_flags, key);
}
Then enter the  __wake_up_common
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
 int nr_exclusive, int wake_flags, void *key)
{
 wait_queue_t *curr, *next;

 list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
 unsigned flags = curr->flags;

 if (curr->func(curr, mode, wake_flags, key) &&
 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
 break;
 }
}
stay  __wake_up_common  in , Select an element to register in the waiting queue  curr,  Call back its  curr->func. Remember us  ep_insert  When called , Put this  func  Set to  ep_poll_callback  了 .

5.3  perform  socket  Ready callback function

Found in the previous section  socket  Wait for the function registered in the queue item  ep_poll_callback, The soft interrupt will then call it .
//file: fs/eventpoll.c
static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
 // obtain  wait  Corresponding  epitem
 struct epitem *epi = ep_item_from_wait(wait);

 // obtain  epitem  Corresponding  eventpoll  Structure
 struct eventpoll *ep = epi->ep;

 //1.  Will the current epitem  Add to  eventpoll  In the ready queue of
 list_add_tail(&epi->rdllink, &ep->rdllist);

 //2.  see  eventpoll  Whether there is waiting in the waiting queue
 if (waitqueue_active(&ep->wq))
 wake_up_locked(&ep->wq);
stay  ep_poll_callback  Based on the additional... On the waiting task queue item  base  The pointer can find  epitem,  Then you can also find  eventpoll  object .
First of all, the first thing it does is
Put your own  epitem  Add to  epoll  In the ready queue of
.
Then it will check  eventpoll  Whether there are waiting items in the waiting queue on the object (epoll_wait  When the settings are executed ).
If you don't execute the soft interrupt, you're done . If there are waiting items , Then find the callback function set in the waiting item .
null
​ call  wake_up_locked() => __wake_up_locked() => __wake_up_common.
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
 int nr_exclusive, int wake_flags, void *key)
{
 wait_queue_t *curr, *next;

 list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
 unsigned flags = curr->flags;

 if (curr->func(curr, mode, wake_flags, key) &&
 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
 break;
 }
}
stay  __wake_up_common  in ,  call  curr->func. there  func  Is in  epoll_wait  It's incoming  default_wake_function  function .

5.4  perform  epoll  Notice of readiness

stay  default_wake_function  Find the process descriptor in the waiting queue item , Then wake up .
null
The source code is as follows :
//file:kernel/sched/core.c
int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
 void *key)
{
 return try_to_wake_up(curr->private, mode, wake_flags);
}
Wait for the queue  curr->private  The pointer is in  epoll  A process that waits on an object and is blocked .
take  epoll_wait  The process is pushed into a runnable queue , Wait for the kernel to reschedule the process . then  epoll_wait  After the corresponding process runs again , From  schedule  recovery
When the process wakes up , Continue from  epoll_wait  The suspended code continues to execute . hold  rdlist  The ready event in is returned to the user process
//file: fs/eventpoll.c
static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 int maxevents, long timeout)
{

 ......
 __remove_wait_queue(&ep->wq, &wait);

 set_current_state(TASK_RUNNING);
 }
check_events:
 // Return the ready event to the user process
 ep_send_events(ep, events, maxevents))
}
From the user's perspective ,epoll_wait  Just waiting a little longer , But the execution process is still sequential .

summary

Let's summarize with a picture  epoll  The whole work journey .
null
When soft interrupt callback, the callback function also arranges :sock_def_readable:sock  Set when the object is initialized => ep_poll_callback : epoll_ctl  Add to  socket  Upper => default_wake_function: epoll_wait  Is set to  epoll  Upper
Sum up ,epoll  In the related functions, the kernel running environment is divided into two parts :
  • User process kernel state . To call  epoll_wait  When waiting for a function, the process will fall into the kernel state to execute . This part of the code is responsible for viewing the receive queue , And be responsible for blocking the current process , Give up  CPU.
  • Hard and soft interrupt context . In these components , Receive the packet from the network card for processing , Then put  socket  Receive queue for . about  epoll  Come on , And find  socket  The associated  epitem, And add it to  epoll  Object in the ready linked list . I'll check it again at this time  epoll  Whether there are blocked processes on , If there is a wake-up call .
In order to introduce every detail , There are many processes involved in this paper , Introduced all the blocking .
But in fact
In practice , As long as there's enough work ,epoll_wait  It won't let the process block at all
. The user process will work all the time , Keep working , until  epoll_wait  When there's really no work to do in the house, I give up on my own initiative  CPU. This is it.  epoll  Where efficiency lies !
Including this article , Feige analyzed one thing with three articles in total , How a network packet reaches your user process from the network card . The other two are as follows :
  • The illustration  | Linux  Network packet receiving process
  • The illustration  |  In depth understanding of the stumbling block on the road of high-performance network development  -  Synchronous blocking network  IO
Congratulations on not being discouraged by the kernel source code , I've been able to hold on until now . Give yourself a slap first , Add a chicken leg to dinner !
Of course, there are still some concepts left in network programming that we haven't talked about , such as  Reactor  and  Proactor  etc. . But relative to the kernel , The technology of these user layers is relatively simple . These are just discussing who is responsible for viewing when multiple processes work together  IO  event 、 Who should be responsible for calculating 、 Who is responsible for sending and receiving , It's just a different division of labor mode of user process .

Reference material

null
​ Recommend a zero sound education C/C++ Free open courses developed in the background , Personally, I think the teacher spoke well , Share with you :
C/C++ Background development senior architect , The content includes Linux,Nginx,ZeroMQ,MySQL,Redis,fastdfs,MongoDB,ZK, Streaming media ,CDN,P2P,K8S,Docker,TCP/IP, coroutines ,DPDK Etc , Learn now

original text :
In depth disclosure  epoll  How to achieve  IO  Multiplex
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/201/202207181623466759.html