当前位置:网站首页>Superscalar processor design yaoyongbin Chapter 2 cache -- Excerpt from subsection 2.3
Superscalar processor design yaoyongbin Chapter 2 cache -- Excerpt from subsection 2.3
2022-06-11 21:55:00 【Qi Qi】
2.3 Multiple ports Cache
In superscalar processors , To improve performance , The processor needs to be able to execute multiple at the same time per cycle load/store Instructions , This requires a multi port D-Cache, To support multiple load/stroe Simultaneous access to instructions .
In fact, in superscalar processors , Many important components are multi port structures , Such as register heap register file、 Launch queue issue queue And reorder cache ROB etc. . Because the capacity of these parts is not very large , So even if the multi port structure is adopted , It will not have much impact on the area and speed of the chip , however D-Cache Different , Its capacity is very large , If the multi port design is adopted , There will be a great negative impact , Therefore, some measures are needed to solve this problem , This section focuses on True Multi-port、Multi Cache Copies and Multi-banking.
2.3.1 True Multi-port
Although in reality , It's impossible to be right Cache Directly adopt multi port design , However, this section looks at the shortcomings of this most primitive method , This method uses a multi port SRAM To achieve multi port Cache, With a dual port Cache For example , All in Cache Both the control path and the data path in need of replication , This means that it has two sets of address decoders address decoder, Make both ports addressable at the same time Tag SRAM and Data SRAM; There are two multiplexers way mux, Used to read data from two ports ; The number of comparators also needs to be doubled , Used to judge the hit of two ports : At the same time, there should be two aligners aligner etc. .Tag SRAM and Data SRAM There is no need to make a copy of it , But each of them Cell Both need to support two parallel read operations , But you don't need two write ports , Because there is no way to SRAM Cell Write twice at the same time .
This method requires copying many circuits , Therefore, the area is increased , And SRAM Cell You need to drive multiple read ports , So it takes longer to access , Power consumption also increases , Therefore, this method is generally not directly used to design multi ports Cache.
2.3.2 Multiple Cache Copies

take Tag SRAM and Data SRAM Made copies , And 2.3.1 Stanza similar , But will be Cache replicate ,SRAM There will be no need to use a multi port structure , This can basically eliminate the impact on the processor cycle time . however , This method wastes a lot of space , And you need to keep two Cache The synchronization between the . for example store Instructions need to be written to two at the same time Cache in , When one Cache line Be replaced , Also need to be on the other Cache Do the same thing , This design is obviously very troublesome , Not a very optimized method , Rarely used in modern processors .
2.3.3 Multi-banking
This structure is a method widely used in real processors , It will Cache Divided into many small bank, Every bank There is only one port , If in one cycle ,Cache The access addresses on multiple ports of are located in different bank In , That won't cause any problems , Only when the addresses of two or more ports are in the same bank In the middle of the day , Will cause bank conflict.

Using this method , A dual port Cache Still need two address decoders 、 Two multiplexers 、 Two sets of comparators and two aligners , and Data SRAM There is no need for multi port structure , This increases the speed , And reduce the area to a certain extent . But because of the need to judge Cache Whether each port of the hit , So for Tag SRAM Come on , You still need to provide the function of reading multiple ports at the same time , That is, adopt multi port SRAM To achieve .
Affect this multi port Cache A key factor in performance is bank Conflict , More bank To ease the problem , send bank The probability of conflict is as low as possible , And it can also improve bank Utilization efficiency of , Avoid having useful data all in one bank What happened , meanwhile , Because each port will access all bank, This requires more cabling resources , It may affect the layout design .
2.3.4 Real examples AMD Opteron Multi port Cache
AMD Of Opteron Series processors are 64 Bit processor , But considering the needs of reality , The address of the processor is not used 64 position , Its virtual address virtual address yes 48 position , Physical address physical address yes 40 position , Reduce the area of silicon wafer by simplifying the address .
Opteron The processor D-Cache Is dual port , Each port is 64 Bit width of bit , Dual ports think that this Cache Be able to support... In one cycle = Two article load/store Instructions are accessed at the same time , It has been used. multi-banking To implement this multi port function .
stay AMD Opteron This of the processor Cache in ,data block Its size is 64 byte , need 6 Bit address , Every data block Be sealed up for 8 Independent bank, Every bank All are 64 Bit single port SRAM.
Whole Cache Its size is 64KB, use 2-way Groups connected , So the size of each path is 32KB; Use Virtually-index,physically-tag How to implement , Use it directly VA Virtual address Cache. Because every way is 32KB size , Therefore need 15 Bit address addressing , And because of everyone data block Size is 64 byte , So addressing each of these bytes requires the use of VA[5:0], The rest VA[14:6] To find every Cache set.
![]()
Because each Cache line Medium data block Is divided into 8 individual bank, Every bank yes 8 Byte wide SRAM, So use it naturally VA[5:3] To find someone bank, The rest VA[2:0] Used to from 8 A byte was found in the data in bytes , In this way, two consecutive 8 Byte data is placed in two adjacent different bank in , Using the principle of spatial locality , So that for these two 8 Byte data access falls on different bank in .
because Cache When accessing each port of , Both will be visited at the same time way Data in , And then according to Tag The comparison results come from two way Choose the one that hits , therefore Cache When accessing a port of , Will access two at the same time bank, Every way Each one .
In processors that support virtual storage , The most common page sizes page by 4KB, This needs to be VA[11:0] To find the inside of the page , So for 48 Bit virtual address , The rest VA[47:12] Just as VPN(Virtual Page Number) To address TLB, Get the physical address PFN(Physical Frame Number)[39:12], It is used to communicate with Tag Part for comparison , Judge whether it hit or not .
For one 2-way Group connected Cache Come on , Compared with the single port implementation , The control logic circuit required for the implementation of the two ports is basically doubled , You need two TLB、 Two Tag The comparator , Twice as much Tag Memory ,Opteron The processor uses the Tag SRAM Copy one to implement dual port SRAM, Of course, you can also use real dual ports SRAM To achieve this function , The area will not decrease much , It will slow down .
except Cache In which data is stored Data SRAM Not copied , Other circuits are basically duplicated , So we use multi-banking Method to achieve dual port Cache, The area will increase a lot , But its advantage is that it is faster , It has a relatively small negative impact on the cycle time of the processor .
边栏推荐
- 网络连接正常但百度网页打不开显示无法访问此网站解决方案
- 189. 轮转数组
- Why microservices are needed
- How to use the transaction code sat to find the name trial version of the background storage database table corresponding to a sapgui screen field
- Rexroth overflow valve zdb6vp2-42/315v
- Go IO module
- 2022-02-28(1)
- R language book learning 03 "in simple terms R language data analysis" - Chapter 8 logistic regression model Chapter 9 clustering model
- crontab中定时执行shell脚本
- RPA+低代码为何是加速财务数字化转型之利器?
猜你喜欢

Master of a famous school has been working hard for 5 years. AI has no paper. How can the tutor free range?

win11怎么看电脑显卡信息

Leetcode-104- maximum depth of binary tree

Carry and walk with you. Have you ever seen a "palm sized" weather station?

win10字体模糊怎么调节

R语言书籍学习03 《深入浅出R语言数据分析》-第十章 关联规则 第十一章 随机森林

判断大小端存储两种办法

打印机无法打印测试页是什么原因

Flink error: multiple tasks are started, and only one task is executed

类与对象(3)
随机推荐
In the post epidemic era, how can enterprise CIOs improve enterprise production efficiency through distance
类和对象(2)
Redis basic data type (set)
238.除自身以外数组的乘积
RPA+低代码助推品牌电商启新创变、重启增长
Expérience 10 génération de courbes bezier - amélioration expérimentale - génération de courbes B - spline par point de contrôle
Popular science | what are the types of NFT (Part 1)
科普 | NFT的类型有哪些(上)
Master of a famous school has been working hard for 5 years. AI has no paper. How can the tutor free range?
如何利用RPA机器人开启货代行业数字化转型第一步?
相对完善的单例模式
R language book learning 03 "in simple terms R language data analysis" - Chapter 10 association rules Chapter 11 random forest
如何查看win系统的安装日期
快速排序的非递归写法
高考结束,人生才刚刚开始,10年职场老鸟给的建议
C语言实现八种排序(1)
Redis data type (string)
Leetcode-104- maximum depth of binary tree
Internet of things development practice 18 scenario linkage: how does an intelligent light perceive light? (I) (learning notes)
R语言书籍学习03 《深入浅出R语言数据分析》-第十章 关联规则 第十一章 随机森林