当前位置:网站首页>Final consistency of MESI cache in CPU -- why does CPU need cache
Final consistency of MESI cache in CPU -- why does CPU need cache
2022-07-04 16:22:00 【zxhtom】
「 This is my participation 2022 For the first time, the third challenge is 4 God , Check out the activity details :2022 For the first time, it's a challenge 」
Preface
- We have released the lock chapter 【java How objects are distributed in memory 】、【java What locks are there 】、【synchronized and volatile】. In the above analysis volatile When I reorder instructions, I see an article introducing CPU Cache consistency issues .
- because volatile The prohibition of instruction reordering is due to the implementation of memory barrier . Another feature is memory visibility, which is implemented through CPU Of MESI To achieve .
- When A The thread brushes the modified data back to the main memory ,CPU At the same time, it informs other threads that the corresponding data in the thread is invalid , Need to get it again
What is? MESI
- MESI In fact, it is the abbreviation of four words , They are a state that describes the copy of data in the thread . We go through MESI Come and see before volatile The process of realizing memory visibility
- CPU Caching data is not the data needed for caching, but based on blocks , Here is 64KB Is the smallest unit . So we modified 1 Other values near the value of one place will also become invalid , In turn, other threads will synchronize data , This is pseudo sharing . Actually, here we are mysql So is design , When we facilitate the query, the data is also divided into the smallest pages , Page size 16KB.
CPU Why cache is needed
- The data stored inside the computer is also stored in blocks , Such a storage method leads to our inability to humanize , Arbitrary access will increase the number of our interactions . although CPU Soon , But the speed of memory can't keep up CPU The speed of , Therefore, it is the best way to access the data by packaging .
- The same is true of reading bytes in our network development . Every time we normally read 1024 byte , This reduces our network interaction
CPU With MESI, Why? Java still more volatile
- First Java In order to improve the efficiency of virtual machine, instruction rearrangement will occur , This is also volatile One of the characteristics
- CPU Of MESI What is guaranteed is a single CPU Visible at a single location . however volatile It's all CPU The operation of . therefore volatile It's necessary
In a typical system , There may be several caches ( In a multicore system , Each core will have its own cache ) Shared main memory bus , Each corresponding
CPU
Will issue a read-write request , And the purpose of caching is to reduceCPU
The number of times to read and write shared main memory .
- A cache is divided into
Invalid
It can be satisfied out of state cpu Read request for , OneInvalid
Must be read from main memory ( becomeS
perhapsE
state ) To satisfyCPU
Read request for . - A write request is only if the cache line is M perhaps E State can only be executed , If the cache line is in
S
state , The cache row in other caches must be changed toInvalid
state ( It's not allowed to be differentCPU
Modify the same cache line at the same time , It is not allowed to modify data at different locations in the cache row ). This operation is often done by broadcasting , for example :RequestFor Ownership
(RFO
). - Cache can change a non at any time M The state of the cache line is invalid , Or become
Invalid
state , And oneM
The cache line of the state must first be written back to main memory . - One is in
M
The state cache line must always listen for all attempts to read the cache line relative to main memory , This operation must write the cache row back to main memory in the cache and change the state to S The state was delayed . - One is in S The state cache line must also listen for requests from other caches to invalidate the cache line or to own the cache line , And make the cache line invalid (
Invalid
). - One is in E The state cache line must also listen to other caches reading the cache line in main memory , Once there's this kind of operation , The cache line needs to become
S
state . - about
M
andE
State is always accurate , They are consistent with the true state of the cache line . andS
The state may be inconsistent , If a cache will be inS
The cache line of the state is invalidated , And the other cache might actually have - It's time to cache , But the cache does not promote the cache row to
E
state , This is because other caches don't broadcast their notification to void the cache line , Also, since the cache does not hold the cache linecopy
The number of , therefore ( Even with such a notice ) There is no way to determine whether you have exclusive access to the cache line . - In the sense above E State is a speculative optimization : If one
CPU
Want to modify a position inS
State cache line , The bus transaction needs to transfer all of the cache rowscopy
becomeInvalid
state , And modifyE
State caching does not require bus transactions .
Case list
- There is an introduction to CPU The cache data unit is 64K . Join us Java Two variables manipulated by multithreading are in the same block , Then a thread is modified a Variable , Another thread operates b Variables also involve data synchronization . Here we can see a code provided by dismounted soldier Daniel , I run it locally , It's fun .
@Data
class Store{
private volatile long p1,p2,p3,p4,p5,p6,p7;
private volatile long p;
private volatile long p8,p9,p10,p11,p12,p13,p14;
}
public class StoreRW {
public static Store[] arr = new Store[2];
public static long COUNT = 1_0000_0000l;
static {
arr[0] = new Store();
arr[1] = new Store();
}
public static void main(String[] args) throws InterruptedException {
Store store = new Store();
final Thread t1 = new Thread(new Runnable() {
@Override
public void run() {
for (long i = 0; i < COUNT; i++) {
arr[0].setP(i);
}
}
});
final Thread t2 = new Thread(new Runnable() {
@Override
public void run() {
for (long i = 0; i < COUNT; i++) {
arr[1].setP(i);
}
}
});
final long start = System.currentTimeMillis();
t1.start();
t2.start();
t1.join();
t2.join();
final long end = System.currentTimeMillis();
System.out.println(end - start);
}
}
Copy code
- The code is simple , That is, two threads constantly operate two variables . If we remove redundant attributes from the object . like this Store Only keep p An attribute
@Data
class Store{
private volatile long p;
}
Copy code
- Running our program found that it was basically stable in 100 millisecond . If I add something irrelevant 14 individual long Properties of type . Then the program can be stable in 70 millisecond . Here, the running time of the program depends on the configuration of the computer . But no matter how the configuration is, you can definitely see whether to add it or not 14 The difference between variables .
- This is about CPU Cache unit . If there is only one attribute . that a r r Two objects in the array are likely to be in the same cache block . So thread A operation a object , So thread B There will be a synchronization . But add 14 Variables can guarantee a r r The two objects of the array are definitely not in the same unit block
- Because with 14 After variables , One Store Take up 15*8=120 Bytes . Then put two anyway Store Definitely not in the same block . and p The variable is still in the middle . That's why this effect appears .
- For this operation, some people will think that the code is not aesthetic , But it does improve performance .JDK Comments are also provided for this
@sun.misc.Contended
; But I tested it and felt whether the performance was improved 14 Variables are large . Teacher ma
summary
- That's all for today's introduction . Mainly with MESI The understanding of the .
Reference article
边栏推荐
- . Net applications consider x64 generation
- D3D11_ Chili_ Tutorial (2): draw a triangle
- Redis shares four cache modes
- LeetCode 35. Search the insertion position - vector traversal (O (logn) and O (n) - binary search)
- 科研漫画 | 联系到被试后还需要做什么?
- Selenium element interaction
- [book club issue 13] coding format of video files
- Anta is actually a technology company? These operations fool netizens
- 压力、焦虑还是抑郁? 正确诊断再治疗
- Understand the context in go language in an article
猜你喜欢
MySQL学习笔记——数据类型(2)
Blood cases caused by Lombok use
Using celery in projects
[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
Lombok使用引发的血案
MySQL learning notes - data type (2)
数据湖治理:优势、挑战和入门
Interface test - knowledge points and common interview questions
Functional interface, method reference, list collection sorting gadget implemented by lambda
QT graphical view frame: element movement
随机推荐
Go deep into the details of deconstruction and assignment of several data types in JS
Vscode prompt Please install clang or check configuration 'clang executable‘
After the eruption of Tonga volcano, we analyzed the global volcanic distribution and found that the area with the most volcanoes is here!
LNX efficient search engine, fastdeploy reasoning deployment toolbox, AI frontier paper | showmeai information daily # 07.04
[book club issue 13] packaging format and coding format of audio files
MYSQL索引优化
[book club issue 13] ffmpeg common methods for viewing media information and processing audio and video files
[flask] ORM one to many relationship
lnx 高效搜索引擎、FastDeploy 推理部署工具箱、AI前沿论文 | ShowMeAI资讯日报 #07.04
Interface fonctionnelle, référence de méthode, Widget de tri de liste implémenté par lambda
LeetCode 35. Search the insertion position - vector traversal (O (logn) and O (n) - binary search)
在芯片高度集成的今天,绝大多数都是CMOS器件
Some fields of the crawler that should be output in Chinese are output as none
Logstash ~ detailed explanation of logstash configuration (logstash.yml)
Interpretation of the champion scheme of CVPR 2020 night target detection challenge
Communication mode based on stm32f1 single chip microcomputer
PXE network
[book club issue 13] coding format of video files
MySQL federated primary key_ MySQL creates a federated primary key [easy to understand]
Salient map drawing based on OpenCV