当前位置:网站首页>Multithreading tutorial (XXVII) CPU cache and pseudo sharing
Multithreading tutorial (XXVII) CPU cache and pseudo sharing
2022-06-11 05:30:00 【Have you become a great God today】
2 Multithreading tutorial ( twenty-seven )cpu cache 、 False sharing
One 、CPU Cache structure

see cpu cache
[email protected] ~ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Stepping: 11
CPU MHz: 1992.002
BogoMIPS: 3984.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0
Speed comparison
| from cpu To | About the required clock cycle |
|---|---|
| register | 1 cycle |
| L1 | 3~4 cycle |
| L2 | 10~20 cycle |
| L3 | 40~45 cycle |
| Memory | 120~240 cycle |
Registers can be understood as being in cpu Inside ,cpu The speed to register is the fastest
Clock cycles and cpu It's about the dominant frequency of , such as 4GHZ The main frequency of , A time period is about 0.25ns
In order to improve the cpu Utilization ratio , We read the memory data into the cache
because CPU And The speed of memory varies greatly , We need to improve efficiency by pre reading data to the cache .
Caching is in cache behavior units , Each cache line corresponds to a block of memory , It's usually 64 byte(8 individual long)
The addition of cache will cause the generation of data copies , That is, the same data will be cached in cache lines of different cores
CPU To ensure data consistency , If a CPU Core changed data , Other CPU The entire cache line corresponding to the core must be invalidated
More detailed cpu The caching mechanism can be seen in Blog , It's very good
Two 、 False sharing
As mentioned earlier, caching is based on cache line pseudo units , Each cache line corresponds to a block of memory , It's usually 64 byte(8 individual long)
But if the amount of data is less than 64byte Half of it is 32byte, Cache rows are stored 2 Data , If one of the two data changes, the entire cache line will be invalidated , This is pseudo sharing .
Introduced in the previous session LongAdder For example ,cell Is the accumulation unit ,LongAdder In order to improve efficiency, several cell.

because Cell It's in the form of an array , It's continuously stored in memory , One Cell by 24 byte (16 Byte object header and 8 Bytes of value), Therefore, cache lines can be saved 2 One of the Cell object . Here comes the question :
Core-0 To be modified Cell[0]
Core-1 To be modified Cell[1]
No matter who modifies it successfully , Will lead to each other Core Cache row invalidation for , such as Core-0 in Cell[0]=6000, Cell[1]=8000 To accumulate Cell[0]=6001, Cell[1]=8000 , This will allow Core-1 Cache row invalidation for
@sun.misc.Contended To solve this problem , Its principle is to add... Before and after the object or field using this annotation 128 Byte size padding, So that CPU Different cache lines are used when pre reading objects to the cache , such , It will not invalidate the other party's cache lines

reference :
边栏推荐
- Minimize maximum
- In the future, how long will robots or AI have human creativity?
- Handle double quotation mark escape in JSON string
- Conversion relationship between coordinate systems (ECEF, LLA, ENU)
- GAMES101作业7-Path Tracing实现过程&代码详细解读
- 1.使用阿里云对象OSS(初级)
- WinForm (I) introduction to WinForm and use of basic controls
- wxParse解析iframe播放视频
- js promise,async,await简单笔记
- Yolov5 training personal data set summary
猜你喜欢

getBackgroundAudioManager控制音乐播放(类名的动态绑定)

C (I) C basic grammar all in one

WinForm (II) advanced WinForm and use of complex controls

JVM tuning 6: GC log analysis and constant pool explanation

微信自定义组件---样式--插槽

Start the project using the locally configured gradle

微信小程序,购买商品属性自动换行,固定div个数,超出部分自动换行

NVIDIA SMI has failed because it could't communicate with the NVIDIA driver

Oh my Zsh correct installation posture

Paper reproduction: expressive body capture
随机推荐
Leetcode 161 Editing distance of 1 (2022.06.10)
Wxparse parsing iframe playing video
Handle double quotation mark escape in JSON string
[NIPS2021]MLP-Mixer: An all-MLP Architecture for Vision
27. Remove elements
SQLite installation and configuration tutorial +navicat operation
Section III: structural characteristics of cement concrete pavement
mysql字符串转数组,合并结果集,转成数组
Games101 job 7-path tracing implementation process & detailed interpretation of code
NVIDIA SMI has failed because it could't communicate with the NVIDIA driver
自定义View之基础篇
MySQL circulates multiple values foreach, XML writing method
GAMES101作业7-Path Tracing实现过程&代码详细解读
Section II: structural composition characteristics of asphalt pavement (1) structural composition
【项目篇- 附件佐证材料放什么?(十八种两千字总结)】创新创业竞赛项目计划书、挑战杯创业计划竞赛佐证材料
Share | guide language image pre training to achieve unified visual language understanding and generation
49. 字母异位词分组
code
Take stock of the AI black technologies in the Beijing Winter Olympic Games, and Shenzhen Yancheng Technology
Overview of self attention acceleration methods: Issa, CCNET, cgnl, linformer