当前位置:网站首页>Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1
Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1
2022-07-01 09:53:00 【Qi Qi】
4.1 summary
A branch predictor with high accuracy branch predictor It is the key component to improve processor performance . But in the real world , High accuracy means more complex algorithms , That is, it takes up more silicon area and consumes more power , meanwhile , It also affects the cycle time of the processor . Even more unfortunate , Different programs have different characteristics , Therefore, it is difficult to find a universally applicable branch prediction algorithm .
If it can be in the instruction fetching stage , Can “ Foreknowledge ” Whether the instruction extracted in this cycle has branch instructions , And you can know its direction ( Jump or not jump ), And destination address (target address) Words , Then you can get the instruction from the target address of the branch instruction in the next cycle , This will not affect the assembly line , It avoids doing useless work .
In this way, you don't have to wait until the result of the branch instruction is really calculated , The process of predicting the result in advance is branch prediction .
Branch prediction is possible , Is determined by the characteristics of the branch instruction ,
Branch instructions contain two elements :
(1) Direction , For a branch instruction , There are only two possible directions , One is a jump (taken), The other is not to jump (not token). Some branch instructions are executed unconditionally , for example jump Instructions
(2) Destination address , If the direction of the branch instruction jumps , You need to know where it jumps , The destination address is also carried in the instruction , about RISC For the instruction set , The target address can exist in two forms in the instruction :
a:PC relative, Also known as a direct jump (direct). Give a relative in the form of an immediate number directly in the instruction PC Offset value offset, Of the current branch instruction PC value ( Or the next instruction of the branch instruction PC value ) Add this offset value to get the target address of the branch instruction . Due to the instruction length 32 position , It limits the size range of the immediate , So this type of branch instruction , Its jump range is generally small . For example, in pipeline decoding decode Stage can separate the immediate numbers in the instruction , Then calculate the target address of the branch instruction . Because the immediate number carried is generally unchanged , So this type is easy for branch prediction , Many processor manuals recommend that you try to use this type of branch instruction , To improve the accuracy of branch prediction , So as to improve the performance of the processor .
b:absolute, Also known as indirect jump indirect. The destination address of the branch instruction comes from the value of a general-purpose register , The number of this register is given by the instruction , Its destination address is 32 The value of a , So you can jump anywhere in the middle of the processor program . however , The value of this general-purpose register usually comes from the result of other instructions , So for branch instructions , It may take some time to get the destination address , For example, you need to wait until the execution of the pipeline execute Stage , During this period of time, the instructions entering the pipeline may be incorrect , This increases the penalty for branch prediction failure misprediction penalty. Most of the indirect jump branch instructions in the program are used to call subroutines CALL/Return, This type of instruction has a strong regularity , Is easy to predict .
Because the branch instruction has two aspects of direction and target address . For direction prediction , You need to predict whether this instruction will jump , For the prediction of the target address , You need to predict the target address of this branch instruction when a jump occurs . For ordinary processors , Because its pipeline depth is not deep , Generally, static branch prediction is used , Predictive branch instructions are always not executed , The processor always fetches instructions sequentially , In the subsequent stage of the assembly line , For example, the execution phase , Get the actual direction and target address of the branch instruction , Then judge . If the branch instruction needs to jump , Discard all instructions that enter the pipeline after the branch instruction ; If you don't jump , Then continue to take instructions sequentially to execute , It's like this branch instruction has never happened .
For a simple processor with a short pipeline , Failure of branch prediction does not cause too many instructions in the pipeline to be discarded .
and ,MIPS processor wield Reduce the waste of one cycle , An unrelated instruction will be placed after the branch instruction , This instruction will always be executed , Regardless of whether the branch instruction has a jump , The position after the branch instruction is MIPS It is called branch delay slot branch delay slot.
According to the actual execution of the processor , Dynamically predict branch instructions , This is dynamic branch prediction . Dynamic branch prediction requires expensive hardware resources .
To do branch prediction , First of all, we need to know from I-Cache Of the instructions taken out , That instruction is a branch instruction , This is true for superscalar processors that fetch multiple instructions per cycle , It's not easy .
The easiest way to think of is to change the instructions in the instruction group from I-Cache After taking it out , Fast decoding , It is called "fast" , Because it only needs to distinguish whether the decoded instruction is a branch instruction , Then, the branch instruction corresponding to PC Value is sent to the branch predictor branch predictor.
Instruction fast decoding fast decaode In the same cycle as branch prediction , It seriously affects the cycle time of the processor . To solve this problem , It can be ordered from L2 Cache Write to I-Cache Before fast decoding , It is called pre decoding pre-decode, Then the information about whether the instruction is a branch instruction is written together with the instruction I-Cache in , This will make I-Cache Occupy more area , But the fast decoding circuit can be omitted , Mitigate the impact on processor cycle time to some extent . however , The interval between the two stages is still too long , Can't be solved .
In the assembly line , The branch prediction is the higher the better , If the instruction is from I-Cache Branch prediction is performed only after it is taken out , When the prediction results are obtained , There have been many subsequent instructions coming into the pipeline , When the predicted value is to jump , These instructions need to be extracted from the pipeline flush, It reduces the execution efficiency of the processor . Therefore, the best time for branch prediction is to get the instruction address according to the current cycle , Branch prediction while fetching instructions , In this way, the next cycle can continue to take instructions according to the predicted results .
For an instruction , Its physical address will change ( It depends on where the operating system places it in physical memory ), And its virtual address is , That is to say PC value , It won't change .
After process switching , You need to clear the contents of the branch predictor , This ensures that branch predictions between different processes do not interfere with each other . If used ASID, Then you can compare it with PC Value together for branch prediction , There is no need to clear the branch predictor during process switching .
according to PC Value to predict whether there are branch instructions in the instruction group of this cycle , And the direction and destination address of the branch instruction , The process is as follows :
Branch prediction based on the address of the instruction is based on , Because once the program starts executing , The instruction fetch address corresponding to each instruction is fixed , Therefore, it can be completely according to the PC Value to determine whether this instruction is a branch instruction . As long as this branch instruction is executed for the first time , When I encounter this again PC value , You can know that the current instruction to be fetched is a branch instruction .
Branch prediction itself is complicated , Different processors have different implementation methods , It is one of the key factors that affect the performance of a processor , In order to consume in hardware 、 Prediction accuracy and delay latency Find a balance between .
边栏推荐
- Network counting 01 physical layer
- Differences between JS valueof and toString
- What is P in cap theory
- Finally, someone made it clear what DRAM and NAND flash are
- Cortex M4 systick details
- 电脑USB、HDMI、DP各种接口及速度
- What is cloud primordial? Will it be the trend of future development?
- 谁还在买“三只松鼠”们
- Mikrotik Routeros Internet access settings
- JS prototype trap
猜你喜欢

HMS core audio editing service 3D audio technology helps create an immersive auditory feast

Import and export of power platform platform sharepointlist

Packetdrill script analysis guide

【黑马早报】俞敏洪称从来不看新东方股价;恒驰5将于7月开启预售;奈雪虚拟股票或涉嫌非法集资;7月1日起冰墩墩停产...

持续进阶,软通动力稳步推动云智能战略

直播管理项目

Cortex M4 systick details

硬件中台项目

TC8:UDP_USER_INTERFACE_01-08

嵌入式开发用到的一些工具
随机推荐
Import and export of power platform platform sharepointlist
mysql截取_mysql截取字符串的方法[通俗易懂]
Strange, why is the ArrayList initialization capacity size 10?
线程基础知识
Change password of MySQL version 5.7 and 8.0
历史上的今天:九十年代末的半导体大战;冯·诺依曼发表第一份草案;CBS 收购 CNET...
The market is relatively weak recently
STM32逆变器电源设计方案,基于STM32F103控制器[通俗易懂]
CSDN's one-stop cloud service is open for internal testing, and new and old users are sincerely invited to grab the fresh
Graduation summary of actual combat camp
SQL learning notes (03) -- data constraint relationship
Upload labs for file upload - white box audit
SQL 化是 ETL 增量生产的第一步,这样的架构的核心能力是什么?
7-Zip 遭抵制?呼吁者定下“三宗罪”:伪开源、不安全、作者来自俄罗斯!
Rich text interpolation
苹果放大招!这件事干的太漂亮了……
Configure load balancing
Unity tips for reducing the amount of code -- empty protection extension
Project procurement management
SQL学习笔记(01)——数据库基本知识