当前位置:网站首页>Final consistency of MESI cache in CPU -- why does CPU need cache

Final consistency of MESI cache in CPU -- why does CPU need cache

2022-07-04 16:22:00 zxhtom

「 This is my participation 2022 For the first time, the third challenge is 4 God , Check out the activity details :2022 For the first time, it's a challenge 」

Preface

  • We have released the lock chapter 【java How objects are distributed in memory 】、【java What locks are there 】、【synchronized and volatile】. In the above analysis volatile When I reorder instructions, I see an article introducing CPU Cache consistency issues .
  • because volatile The prohibition of instruction reordering is due to the implementation of memory barrier . Another feature is memory visibility, which is implemented through CPU Of MESI To achieve .
  • When A The thread brushes the modified data back to the main memory ,CPU At the same time, it informs other threads that the corresponding data in the thread is invalid , Need to get it again

What is? MESI

image-20211209115724253.png

  • MESI In fact, it is the abbreviation of four words , They are a state that describes the copy of data in the thread . We go through MESI Come and see before volatile The process of realizing memory visibility

image-20211209133215512-16390279362811.png

  • CPU Caching data is not the data needed for caching, but based on blocks , Here is 64KB Is the smallest unit . So we modified 1 Other values near the value of one place will also become invalid , In turn, other threads will synchronize data , This is pseudo sharing . Actually, here we are mysql So is design , When we facilitate the query, the data is also divided into the smallest pages , Page size 16KB.

CPU Why cache is needed

  • The data stored inside the computer is also stored in blocks , Such a storage method leads to our inability to humanize , Arbitrary access will increase the number of our interactions . although CPU Soon , But the speed of memory can't keep up CPU The speed of , Therefore, it is the best way to access the data by packaging .
  • The same is true of reading bytes in our network development . Every time we normally read 1024 byte , This reduces our network interaction

CPU With MESI, Why? Java still more volatile

  • First Java In order to improve the efficiency of virtual machine, instruction rearrangement will occur , This is also volatile One of the characteristics
  • CPU Of MESI What is guaranteed is a single CPU Visible at a single location . however volatile It's all CPU The operation of . therefore volatile It's necessary

In a typical system , There may be several caches ( In a multicore system , Each core will have its own cache ) Shared main memory bus , Each corresponding CPU Will issue a read-write request , And the purpose of caching is to reduce CPU The number of times to read and write shared main memory .

  • A cache is divided into Invalid It can be satisfied out of state cpu Read request for , One Invalid Must be read from main memory ( become S perhaps E state ) To satisfy CPU Read request for .
  • A write request is only if the cache line is M perhaps E State can only be executed , If the cache line is in S state , The cache row in other caches must be changed to Invalid state ( It's not allowed to be different CPU Modify the same cache line at the same time , It is not allowed to modify data at different locations in the cache row ). This operation is often done by broadcasting , for example :RequestFor Ownership (RFO).
  • Cache can change a non at any time M The state of the cache line is invalid , Or become Invalid state , And one M The cache line of the state must first be written back to main memory .
  • One is in M The state cache line must always listen for all attempts to read the cache line relative to main memory , This operation must write the cache row back to main memory in the cache and change the state to S The state was delayed .
  • One is in S The state cache line must also listen for requests from other caches to invalidate the cache line or to own the cache line , And make the cache line invalid (Invalid).
  • One is in E The state cache line must also listen to other caches reading the cache line in main memory , Once there's this kind of operation , The cache line needs to become S state .
  • about M and E State is always accurate , They are consistent with the true state of the cache line . and S The state may be inconsistent , If a cache will be in S The cache line of the state is invalidated , And the other cache might actually have
  • It's time to cache , But the cache does not promote the cache row to E state , This is because other caches don't broadcast their notification to void the cache line , Also, since the cache does not hold the cache line copy The number of , therefore ( Even with such a notice ) There is no way to determine whether you have exclusive access to the cache line .
  • In the sense above E State is a speculative optimization : If one CPU Want to modify a position in S State cache line , The bus transaction needs to transfer all of the cache rows copy become Invalid state , And modify E State caching does not require bus transactions .

Case list

  • There is an introduction to CPU The cache data unit is 64K . Join us Java Two variables manipulated by multithreading are in the same block , Then a thread is modified a Variable , Another thread operates b Variables also involve data synchronization . Here we can see a code provided by dismounted soldier Daniel , I run it locally , It's fun .
 ​
 @Data
 class Store{
     private volatile long p1,p2,p3,p4,p5,p6,p7;
     private volatile long p;
     private volatile long p8,p9,p10,p11,p12,p13,p14;
 }
 ​
 public class StoreRW {
     public static Store[] arr = new Store[2];
     public static long COUNT = 1_0000_0000l;
     static {
         arr[0] = new Store();
         arr[1] = new Store();
     }
 ​
     public static void main(String[] args) throws InterruptedException {
         Store store = new Store();
         final Thread t1 = new Thread(new Runnable() {
             @Override
             public void run() {
                 for (long i = 0; i < COUNT; i++) {
                     arr[0].setP(i);
                 }
             }
         });
         final Thread t2 = new Thread(new Runnable() {
             @Override
             public void run() {
                 for (long i = 0; i < COUNT; i++) {
                     arr[1].setP(i);
                 }
             }
         });
         final long start = System.currentTimeMillis();
         t1.start();
         t2.start();
         t1.join();
         t2.join();
         final long end = System.currentTimeMillis();
         System.out.println(end - start);
     }
 }
 Copy code 
  • The code is simple , That is, two threads constantly operate two variables . If we remove redundant attributes from the object . like this Store Only keep p An attribute
 ​
 @Data
 class Store{
     private volatile long p;
 }
 Copy code 
  • Running our program found that it was basically stable in 100 millisecond . If I add something irrelevant 14 individual long Properties of type . Then the program can be stable in 70 millisecond . Here, the running time of the program depends on the configuration of the computer . But no matter how the configuration is, you can definitely see whether to add it or not 14 The difference between variables .
  • This is about CPU Cache unit . If there is only one attribute . that a r r Two objects in the array are likely to be in the same cache block . So thread A operation a object , So thread B There will be a synchronization . But add 14 Variables can guarantee a r r The two objects of the array are definitely not in the same unit block
  • Because with 14 After variables , One Store Take up 15*8=120 Bytes . Then put two anyway Store Definitely not in the same block . and p The variable is still in the middle . That's why this effect appears .
  • For this operation, some people will think that the code is not aesthetic , But it does improve performance .JDK Comments are also provided for this @sun.misc.Contended ; But I tested it and felt whether the performance was improved 14 Variables are large . Teacher ma

summary

  • That's all for today's introduction . Mainly with MESI The understanding of the .

Reference article

CPU Cache consistency case analysis is detailed

原网站

版权声明
本文为[zxhtom]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202202141142001864.html

随机推荐