当前位置:网站首页>Learning memory barrier
Learning memory barrier
2022-06-26 15:23:00 【make-n】
Read the following blog and comments : Record your understanding of the memory barrier .
https://blog.csdn.net/world_hello_100/article/details/50131497
【1】 Compiler barrier : Optimization grade O2,O3 It is possible to change the actual execution order of instructions , Introduce the inconsistency between the instruction and the code logic .
resolvent 1: Add a compiler barrier:
#define barrier() __asm__ __volatile__("" ::: "memory")
resolvent 2:
You can also use volatile This keyword is used to avoid disordered memory access at compile time ( and It is impossible to avoid disordered memory access at runtime, which will be discussed later ).
stay Linux The kernel , Provides a macro ACCESS_ONCE To avoid compiler interference with continuous ACCESS_ONCE Instance to rearrange instructions .ACCESS_ONCE(x) Use as an lvalue .
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
/******* Separator ******/
ACCESS_ONCE(x) = r;
ACCESS_ONCE(y) = x;
【2】 Run time disorder
When executing out of order , The order in which a processor actually executes instructions is determined by the available input data , Not the order written by the programmer .
Out of order processor (Out-of-order processors) Processing instructions usually have the following steps :
1, Instructions for
2, Instructions are distributed to the instruction queue
3, Instructions are waiting in the instruction queue , Until the input operand is available ( Once the input operand is available , Instructions can leave the queue , Even if earlier instructions are not executed )
4, Instructions are assigned to appropriate functional units and executed
5, The execution result is put in the queue ( Instead of writing to the register heap immediately )
6, Only after the execution results of all the earlier requested instructions are written to the register file , The result of the instruction execution is written to the register file ( Perform result reordering , Make execution seem orderly )
In the single CPU On , The acquisition of instructions and the write back of results are orderly , non-existent CPU The problem of disordered order of execution instructions . But on multiprocessors, each CPU Have their own cache Memory , When CPU When writing operations , It's about writing cache, There is no guarantee cache The consistency of , There will be problems , You have to go through a cache Consistency protocol to avoid data inconsistency , This protocol communication process may lead to disordered access , That is to say, the run-time memory access out of order is due to multi-core cache Caused by inconsistency .
In actual application development , Developers may not know at all Memory barrier You can develop the correct multithreaded program , This is mainly because Various synchronization mechanisms have implied Memory barrier( But with the actual Memory barrier There are nuances ), This makes it impossible to use Memory barrier There will be no problem . But if you want to write something like a lock free data structure , that Memory barrier It's still useful .
Memory barrier Common occasions include :
Implement synchronization primitives (synchronization primitives)
Implement lockless data structure (lock-free data structures)
The driver
Memory barrier interface
Universal barrier, Ensure orderly read and write operations ( There is reading before and after the barrier , There are also write operations , Ensure the order of these two operations ),mb()
Write operations barrier, Only write operations are guaranteed to be orderly ( There are write operations before and after the barrier , Ensure the order of these two write operations ),wmb()
Read operations barrier, Only ensure that the read operation is orderly ( There are read operations before and after the barrier , Ensure the order of these two read operations ),rmb()
Analyze the lockless structure :
/** * __kfifo_put - puts some data into the FIFO, no locking version * @fifo: the fifo to be used. * @buffer: the data to be added. * @len: the length of the data to be added. * * This function copies at most @len bytes from the @buffer into * the FIFO depending on the free space, and returns the number of * bytes copied. * * Note that with only one concurrent reader and one concurrent * writer, you don't need extra locking to use these functions. */
unsigned int __kfifo_put(struct kfifo *fifo,
const unsigned char *buffer, unsigned int len)
{
unsigned int l;
len = min(len, fifo->size - fifo->in + fifo->out);
/** Ensure that we sample the fifo->out index -before- we * start putting bytes into the kfifo.*/
/* Guaranteed here First read the correct fifo->out, Calculate the correct len, Then write the data to kfifo, If it reads kfifo error , To calculate the kfifo Of The writable space is too small */
smp_mb();
/* first put the data starting from fifo->in to buffer end */
l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
/* then put the rest (if any) at the beginning of the buffer */
memcpy(fifo->buffer, buffer + l, len - l);
/** Ensure that we add the bytes to the kfifo -before- * we update the fifo->in index. */
/* Here is the guarantee of Orderly write operations , Write data first , Update again in index */
smp_wmb();
fifo->in += len;
return len;
}
EXPORT_SYMBOL(__kfifo_put);
/** * __kfifo_get - gets some data from the FIFO, no locking version * @fifo: the fifo to be used. * @buffer: where the data must be copied. * @len: the size of the destination buffer. * * This function copies at most @len bytes from the FIFO into the * @buffer and returns the number of copied bytes. * * Note that with only one concurrent reader and one concurrent * writer, you don't need extra locking to use these functions. */
unsigned int __kfifo_get(struct kfifo *fifo,
unsigned char *buffer, unsigned int len)
{
unsigned int l;
len = min(len, fifo->in - fifo->out);
/** Ensure that we sample the fifo->in index -before- we * start removing bytes from the kfifo.*/
/* First read the correct fifo->in, Calculate the correct data length , Then read kfifo The data of , Ensure the order of two read operations */
smp_rmb();
/* first get the data from fifo->out until the end of the buffer */
l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
/* then get the rest (if any) from the beginning of the buffer */
memcpy(buffer + l, fifo->buffer, len - l);
/** Ensure that we remove the bytes from the kfifo -before- * we update the fifo->out index.*/
/* First read about kfifo The data of , Then write fifo->out index, A read , A write operation */
smp_mb();
fifo->out += len;
return len;
}
EXPORT_SYMBOL(__kfifo_get);
Finally, in passing, some techniques used in this implementation are irrelevant to the topic of this article :
1, Use And & Operation to find the subscript of the ring buffer , It is much more efficient than the remainder operation . The premise of using and operating to obtain the subscript is that the size of the ring buffer must be 2 Of N Power , In other words, the size of the ring buffer is only one 1 Binary number of , that index & (size – 1) Is the subscript of the evaluation ( It's not hard to understand )
2, Used in and out Two indexes and in and out It's increasing all the time ( This method is quite ingenious ), This can avoid some complex conditional judgments ( Some implementations ,in == out It is impossible to tell whether the buffer is empty or full )
【 doubt 】:
in and out It's increasing all the time ,in Return after overflow 0,out No overflow , To calculate the
len = min(len, fifo->in - fifo->out); Is there any error in the valid data of .
边栏推荐
- Restcloud ETL resolves shell script parameterization
- 【TcaplusDB知识库】TcaplusDB OMS业务人员权限介绍
- Redis cluster re fragmentation and ask command
- 一篇博客彻底掌握:粒子滤波 particle filter (PF) 的理论及实践(matlab版)
- The heavyweight white paper was released. Huawei continues to lead the new model of smart park construction in the future
- Unity C # e-learning (VIII) -- www
- [CEPH] cephfs internal implementation (IV): how is MDS started-- Undigested
- 【TcaplusDB知识库】TcaplusDB常规单据介绍
- 数据库-序列
- php文件上传00截断
猜你喜欢

评价——模糊综合评价

# 粒子滤波 PF——三维匀速运动CV目标跟踪(粒子滤波VS扩展卡尔曼滤波)
![[CEPH] cephfs internal implementation (II): example -- undigested](/img/87/6eb214550faf1f0500565c1610ff3b.png)
[CEPH] cephfs internal implementation (II): example -- undigested

【TcaplusDB知识库】TcaplusDB数据构造介绍

Restcloud ETL resolves shell script parameterization

【SNMP】snmp trap 介绍、安装、命令|Trap的发送与接收代码实现

vsomeip3 双机通信文件配置

【TcaplusDB知识库】TcaplusDB单据受理-创建业务介绍
Mr. Du said that the website was updated with illustrations

Vsomeip3 dual computer communication file configuration
随机推荐
Sikuli 基于图形识别的自动化测试技术
Mr. Du said that the website was updated with illustrations
Redis-集群
TCP 复位攻击原理
【TcaplusDB知识库】TcaplusDB OMS业务人员权限介绍
【小程序实战系列】小程序框架 页面注册 生命周期 介绍
[tcapulusdb knowledge base] tcapulusdb doc acceptance - create business introduction
Is it safe for flush to register and open an account? Is there any risk?
SAP GUI 770 Download
评价——模糊综合评价
Redis transaction and watch instruction
在校生学习生涯总结(2022)
10 minutes to understand bim+gis fusion, common BIM data formats and characteristics
One click analysis hardware /io/ national network performance script (strong push)
Optimizing for vectorization
SAP sales data actual shipment data export sales
Sikuli automatic testing technology based on pattern recognition
R language GLM function logistic regression model, using epidisplay package logistic The display function obtains the summary statistical information of the model (initial and adjusted odds ratio and
Redis cluster re fragmentation and ask command
Unity C# 网络学习(十)——UnityWebRequest(二)