当前位置：网站首页>Superscalar processor design yaoyongbin Chapter 2 cache -- Excerpt from subsection 2.3

Superscalar processor design yaoyongbin Chapter 2 cache -- Excerpt from subsection 2.3

2022-06-11 21:55:00 【Qi Qi】

2.3 Multiple ports Cache

In superscalar processors , To improve performance , The processor needs to be able to execute multiple at the same time per cycle load/store Instructions , This requires a multi port D-Cache, To support multiple load/stroe Simultaneous access to instructions .

In fact, in superscalar processors , Many important components are multi port structures , Such as register heap register file、 Launch queue issue queue And reorder cache ROB etc. . Because the capacity of these parts is not very large , So even if the multi port structure is adopted , It will not have much impact on the area and speed of the chip , however D-Cache Different , Its capacity is very large , If the multi port design is adopted , There will be a great negative impact , Therefore, some measures are needed to solve this problem , This section focuses on True Multi-port、Multi Cache Copies and Multi-banking.

2.3.1 True Multi-port

Although in reality , It's impossible to be right Cache Directly adopt multi port design , However, this section looks at the shortcomings of this most primitive method , This method uses a multi port SRAM To achieve multi port Cache, With a dual port Cache For example , All in Cache Both the control path and the data path in need of replication , This means that it has two sets of address decoders address decoder, Make both ports addressable at the same time Tag SRAM and Data SRAM; There are two multiplexers way mux, Used to read data from two ports ; The number of comparators also needs to be doubled , Used to judge the hit of two ports ： At the same time, there should be two aligners aligner etc. .Tag SRAM and Data SRAM There is no need to make a copy of it , But each of them Cell Both need to support two parallel read operations , But you don't need two write ports , Because there is no way to SRAM Cell Write twice at the same time .

This method requires copying many circuits , Therefore, the area is increased , And SRAM Cell You need to drive multiple read ports , So it takes longer to access , Power consumption also increases , Therefore, this method is generally not directly used to design multi ports Cache.

2.3.2 Multiple Cache Copies

take Tag SRAM and Data SRAM Made copies , And 2.3.1 Stanza similar , But will be Cache replicate ,SRAM There will be no need to use a multi port structure , This can basically eliminate the impact on the processor cycle time . however , This method wastes a lot of space , And you need to keep two Cache The synchronization between the . for example store Instructions need to be written to two at the same time Cache in , When one Cache line Be replaced , Also need to be on the other Cache Do the same thing , This design is obviously very troublesome , Not a very optimized method , Rarely used in modern processors .

2.3.3 Multi-banking

This structure is a method widely used in real processors , It will Cache Divided into many small bank, Every bank There is only one port , If in one cycle ,Cache The access addresses on multiple ports of are located in different bank In , That won't cause any problems , Only when the addresses of two or more ports are in the same bank In the middle of the day , Will cause bank conflict.

Using this method , A dual port Cache Still need two address decoders 、 Two multiplexers 、 Two sets of comparators and two aligners , and Data SRAM There is no need for multi port structure , This increases the speed , And reduce the area to a certain extent . But because of the need to judge Cache Whether each port of the hit , So for Tag SRAM Come on , You still need to provide the function of reading multiple ports at the same time , That is, adopt multi port SRAM To achieve .

Affect this multi port Cache A key factor in performance is bank Conflict , More bank To ease the problem , send bank The probability of conflict is as low as possible , And it can also improve bank Utilization efficiency of , Avoid having useful data all in one bank What happened , meanwhile , Because each port will access all bank, This requires more cabling resources , It may affect the layout design .

2.3.4 Real examples AMD Opteron Multi port Cache

AMD Of Opteron Series processors are 64 Bit processor , But considering the needs of reality , The address of the processor is not used 64 position , Its virtual address virtual address yes 48 position , Physical address physical address yes 40 position , Reduce the area of silicon wafer by simplifying the address .

Opteron The processor D-Cache Is dual port , Each port is 64 Bit width of bit , Dual ports think that this Cache Be able to support... In one cycle = Two article load/store Instructions are accessed at the same time , It has been used. multi-banking To implement this multi port function .

stay AMD Opteron This of the processor Cache in ,data block Its size is 64 byte , need 6 Bit address , Every data block Be sealed up for 8 Independent bank, Every bank All are 64 Bit single port SRAM.

Whole Cache Its size is 64KB, use 2-way Groups connected , So the size of each path is 32KB; Use Virtually-index,physically-tag How to implement , Use it directly VA Virtual address Cache. Because every way is 32KB size , Therefore need 15 Bit address addressing , And because of everyone data block Size is 64 byte , So addressing each of these bytes requires the use of VA[5:0], The rest VA[14:6] To find every Cache set.

Because each Cache line Medium data block Is divided into 8 individual bank, Every bank yes 8 Byte wide SRAM, So use it naturally VA[5:3] To find someone bank, The rest VA[2:0] Used to from 8 A byte was found in the data in bytes , In this way, two consecutive 8 Byte data is placed in two adjacent different bank in , Using the principle of spatial locality , So that for these two 8 Byte data access falls on different bank in .

because Cache When accessing each port of , Both will be visited at the same time way Data in , And then according to Tag The comparison results come from two way Choose the one that hits , therefore Cache When accessing a port of , Will access two at the same time bank, Every way Each one .

In processors that support virtual storage , The most common page sizes page by 4KB, This needs to be VA[11:0] To find the inside of the page , So for 48 Bit virtual address , The rest VA[47:12] Just as VPN（Virtual Page Number） To address TLB, Get the physical address PFN（Physical Frame Number）[39:12], It is used to communicate with Tag Part for comparison , Judge whether it hit or not .

For one 2-way Group connected Cache Come on , Compared with the single port implementation , The control logic circuit required for the implementation of the two ports is basically doubled , You need two TLB、 Two Tag The comparator , Twice as much Tag Memory ,Opteron The processor uses the Tag SRAM Copy one to implement dual port SRAM, Of course, you can also use real dual ports SRAM To achieve this function , The area will not decrease much , It will slow down .

except Cache In which data is stored Data SRAM Not copied , Other circuits are basically duplicated , So we use multi-banking Method to achieve dual port Cache, The area will increase a lot , But its advantage is that it is faster , It has a relatively small negative impact on the cycle time of the processor .

原网站

版权声明
本文为[Qi Qi]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206112141491494.html