当前位置：网站首页>Learning memory barrier

Learning memory barrier

2022-06-26 15:23:00 【make-n】

Read the following blog and comments ： Record your understanding of the memory barrier .
https://blog.csdn.net/world_hello_100/article/details/50131497
【1】 Compiler barrier ： Optimization grade O2,O3 It is possible to change the actual execution order of instructions , Introduce the inconsistency between the instruction and the code logic .
resolvent 1： Add a compiler barrier：

    #define barrier() __asm__ __volatile__("" ::: "memory")

resolvent 2：
You can also use volatile This keyword is used to avoid disordered memory access at compile time （ and It is impossible to avoid disordered memory access at runtime, which will be discussed later ）.
stay Linux The kernel , Provides a macro ACCESS_ONCE To avoid compiler interference with continuous ACCESS_ONCE Instance to rearrange instructions .ACCESS_ONCE(x) Use as an lvalue .

    #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
    /*******  Separator  ******/
    ACCESS_ONCE(x) = r;
    ACCESS_ONCE(y) = x;

【2】 Run time disorder
When executing out of order , The order in which a processor actually executes instructions is determined by the available input data , Not the order written by the programmer .

 Out of order processor （Out-of-order processors） Processing instructions usually have the following steps ：
    1, Instructions for 
    2, Instructions are distributed to the instruction queue 
    3, Instructions are waiting in the instruction queue , Until the input operand is available （ Once the input operand is available , Instructions can leave the queue , Even if earlier instructions are not executed ）
    4, Instructions are assigned to appropriate functional units and executed 
    5, The execution result is put in the queue （ Instead of writing to the register heap immediately ）
    6, Only after the execution results of all the earlier requested instructions are written to the register file , The result of the instruction execution is written to the register file （ Perform result reordering , Make execution seem orderly ）

In the single CPU On , The acquisition of instructions and the write back of results are orderly , non-existent CPU The problem of disordered order of execution instructions . But on multiprocessors, each CPU Have their own cache Memory , When CPU When writing operations , It's about writing cache, There is no guarantee cache The consistency of , There will be problems , You have to go through a cache Consistency protocol to avoid data inconsistency , This protocol communication process may lead to disordered access , That is to say, the run-time memory access out of order is due to multi-core cache Caused by inconsistency .

In actual application development , Developers may not know at all Memory barrier You can develop the correct multithreaded program , This is mainly because Various synchronization mechanisms have implied Memory barrier（ But with the actual Memory barrier There are nuances ）, This makes it impossible to use Memory barrier There will be no problem . But if you want to write something like a lock free data structure , that Memory barrier It's still useful .

Memory barrier  Common occasions include ：
     Implement synchronization primitives （synchronization primitives）
     Implement lockless data structure （lock-free data structures）
     The driver

Memory barrier interface

 Universal  barrier, Ensure orderly read and write operations （ There is reading before and after the barrier , There are also write operations , Ensure the order of these two operations ）,mb() 
 Write operations  barrier, Only write operations are guaranteed to be orderly （ There are write operations before and after the barrier , Ensure the order of these two write operations ）,wmb() 
 Read operations  barrier, Only ensure that the read operation is orderly （ There are read operations before and after the barrier , Ensure the order of these two read operations ）,rmb()

Analyze the lockless structure ：

/** * __kfifo_put - puts some data into the FIFO, no locking version * @fifo: the fifo to be used. * @buffer: the data to be added. * @len: the length of the data to be added. * * This function copies at most @len bytes from the @buffer into * the FIFO depending on the free space, and returns the number of * bytes copied. * * Note that with only one concurrent reader and one concurrent * writer, you don't need extra locking to use these functions. */
unsigned int __kfifo_put(struct kfifo *fifo,
                         const unsigned char *buffer, unsigned int len)
{
    
    unsigned int l;
    len = min(len, fifo->size - fifo->in + fifo->out);
    
    /** Ensure that we sample the fifo->out index -before- we * start putting bytes into the kfifo.*/
    /* Guaranteed here   First read the correct fifo->out, Calculate the correct len, Then write the data to kfifo,  If it reads kfifo error , To calculate the kfifo Of   The writable space is too small  */
    smp_mb();
    
    /* first put the data starting from fifo->in to buffer end */
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
    /* then put the rest (if any) at the beginning of the buffer */
    memcpy(fifo->buffer, buffer + l, len - l);
    /** Ensure that we add the bytes to the kfifo -before- * we update the fifo->in index. */
	/* Here is the guarantee of   Orderly write operations , Write data first , Update again in index */
    smp_wmb();
    fifo->in += len;
    
    return len;
}
EXPORT_SYMBOL(__kfifo_put);
 
/** * __kfifo_get - gets some data from the FIFO, no locking version * @fifo: the fifo to be used. * @buffer: where the data must be copied. * @len: the size of the destination buffer. * * This function copies at most @len bytes from the FIFO into the * @buffer and returns the number of copied bytes. * * Note that with only one concurrent reader and one concurrent * writer, you don't need extra locking to use these functions. */
unsigned int __kfifo_get(struct kfifo *fifo,
                         unsigned char *buffer, unsigned int len)
{
    
    unsigned int l;
    len = min(len, fifo->in - fifo->out);
    /** Ensure that we sample the fifo->in index -before- we * start removing bytes from the kfifo.*/
    /*  First read the correct fifo->in, Calculate the correct data length , Then read kfifo  The data of ,  Ensure the order of two read operations */
    smp_rmb();
    /* first get the data from fifo->out until the end of the buffer */
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
 
    /* then get the rest (if any) from the beginning of the buffer */
    memcpy(buffer + l, fifo->buffer, len - l);
    /** Ensure that we remove the bytes from the kfifo -before- * we update the fifo->out index.*/
    /* First read about kfifo The data of , Then write fifo->out index, A read , A write operation */
    smp_mb();
    fifo->out += len;
    return len;
}
EXPORT_SYMBOL(__kfifo_get);

Finally, in passing, some techniques used in this implementation are irrelevant to the topic of this article ：

1, Use And & Operation to find the subscript of the ring buffer , It is much more efficient than the remainder operation . The premise of using and operating to obtain the subscript is that the size of the ring buffer must be 2 Of N Power , In other words, the size of the ring buffer is only one 1 Binary number of , that index & (size – 1) Is the subscript of the evaluation （ It's not hard to understand ）
2, Used in and out Two indexes and in and out It's increasing all the time （ This method is quite ingenious ）, This can avoid some complex conditional judgments （ Some implementations ,in == out It is impossible to tell whether the buffer is empty or full ）

【 doubt 】：
in and out It's increasing all the time ,in Return after overflow 0,out No overflow , To calculate the
len = min(len, fifo->in - fifo->out); Is there any error in the valid data of .

原网站

版权声明
本文为[make-n]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206261508562010.html

当前位置：网站首页>Learning memory barrier

Learning memory barrier

边栏推荐

猜你喜欢

随机推荐