当前位置:网站首页>Multithreading tutorial (XXVII) CPU cache and pseudo sharing
Multithreading tutorial (XXVII) CPU cache and pseudo sharing
2022-06-11 05:30:00 【Have you become a great God today】
2 Multithreading tutorial ( twenty-seven )cpu cache 、 False sharing
One 、CPU Cache structure

see cpu cache
[email protected] ~ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Stepping: 11
CPU MHz: 1992.002
BogoMIPS: 3984.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0
Speed comparison
| from cpu To | About the required clock cycle |
|---|---|
| register | 1 cycle |
| L1 | 3~4 cycle |
| L2 | 10~20 cycle |
| L3 | 40~45 cycle |
| Memory | 120~240 cycle |
Registers can be understood as being in cpu Inside ,cpu The speed to register is the fastest
Clock cycles and cpu It's about the dominant frequency of , such as 4GHZ The main frequency of , A time period is about 0.25ns
In order to improve the cpu Utilization ratio , We read the memory data into the cache
because CPU And The speed of memory varies greatly , We need to improve efficiency by pre reading data to the cache .
Caching is in cache behavior units , Each cache line corresponds to a block of memory , It's usually 64 byte(8 individual long)
The addition of cache will cause the generation of data copies , That is, the same data will be cached in cache lines of different cores
CPU To ensure data consistency , If a CPU Core changed data , Other CPU The entire cache line corresponding to the core must be invalidated
More detailed cpu The caching mechanism can be seen in Blog , It's very good
Two 、 False sharing
As mentioned earlier, caching is based on cache line pseudo units , Each cache line corresponds to a block of memory , It's usually 64 byte(8 individual long)
But if the amount of data is less than 64byte Half of it is 32byte, Cache rows are stored 2 Data , If one of the two data changes, the entire cache line will be invalidated , This is pseudo sharing .
Introduced in the previous session LongAdder For example ,cell Is the accumulation unit ,LongAdder In order to improve efficiency, several cell.

because Cell It's in the form of an array , It's continuously stored in memory , One Cell by 24 byte (16 Byte object header and 8 Bytes of value), Therefore, cache lines can be saved 2 One of the Cell object . Here comes the question :
Core-0 To be modified Cell[0]
Core-1 To be modified Cell[1]
No matter who modifies it successfully , Will lead to each other Core Cache row invalidation for , such as Core-0 in Cell[0]=6000, Cell[1]=8000 To accumulate Cell[0]=6001, Cell[1]=8000 , This will allow Core-1 Cache row invalidation for
@sun.misc.Contended To solve this problem , Its principle is to add... Before and after the object or field using this annotation 128 Byte size padding, So that CPU Different cache lines are used when pre reading objects to the cache , such , It will not invalidate the other party's cache lines

reference :
边栏推荐
- Deep search + backtracking
- 22. Generate parentheses
- PageHelper page 2 collections in the same interface
- The programmers of a large factory after 95 were dissatisfied with the department leaders, and were sentenced for deleting the database and running away
- 35.搜索插入位置
- English digital converter
- (十五)红外通信
- Cascade EF gan: local focus progressive facial expression editing
- 微信自定义组件---样式--插槽
- 自定义View之基础篇
猜你喜欢

Reverse thinking: making cartoon photos real

lower_ Personal understanding of bound function

Click the icon is not sensitive how to adjust?

Oh my Zsh correct installation posture

WinForm (II) advanced WinForm and use of complex controls

IOU series (IOU, giou, Diou, CIO)

Stone game -- leetcode practice

BERT知识蒸馏

Linked list de duplication

Paper reproduction: expressive body capture
随机推荐
Bert knowledge distillation
JVM tuning 6: GC log analysis and constant pool explanation
Customize the layout of view Foundation
Introduction to coordinate system in navigation system
Exploration of kangaroo cloud data stack on spark SQL optimization based on CBO
[aaai 2021 timing action nomination generation] detailed interpretation of bsn++ long article
JVM tuning V: JVM tuning tools and tuning practice
初步了解多任务学习
Zed2 camera manual
Stone game -- leetcode practice
6 questions to ask when selecting a digital asset custodian
35.搜索插入位置
WinForm (I) introduction to WinForm and use of basic controls
String sorting times --- bubble sorting deformation
Overview of self attention acceleration methods: Issa, CCNET, cgnl, linformer
How much current can PCB wiring carry
White Gaussian noise (WGN)
Share | guide language image pre training to achieve unified visual language understanding and generation
【深入kotlin】 - 初识 Flow
The central rural work conference has released important signals. Ten ways for AI technology to help agriculture can be expected in the future