当前位置:网站首页>Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1
Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1
2022-07-01 09:53:00 【Qi Qi】
4.1 summary
A branch predictor with high accuracy branch predictor It is the key component to improve processor performance . But in the real world , High accuracy means more complex algorithms , That is, it takes up more silicon area and consumes more power , meanwhile , It also affects the cycle time of the processor . Even more unfortunate , Different programs have different characteristics , Therefore, it is difficult to find a universally applicable branch prediction algorithm .
If it can be in the instruction fetching stage , Can “ Foreknowledge ” Whether the instruction extracted in this cycle has branch instructions , And you can know its direction ( Jump or not jump ), And destination address (target address) Words , Then you can get the instruction from the target address of the branch instruction in the next cycle , This will not affect the assembly line , It avoids doing useless work .
In this way, you don't have to wait until the result of the branch instruction is really calculated , The process of predicting the result in advance is branch prediction .
Branch prediction is possible , Is determined by the characteristics of the branch instruction ,
Branch instructions contain two elements :
(1) Direction , For a branch instruction , There are only two possible directions , One is a jump (taken), The other is not to jump (not token). Some branch instructions are executed unconditionally , for example jump Instructions
(2) Destination address , If the direction of the branch instruction jumps , You need to know where it jumps , The destination address is also carried in the instruction , about RISC For the instruction set , The target address can exist in two forms in the instruction :
a:PC relative, Also known as a direct jump (direct). Give a relative in the form of an immediate number directly in the instruction PC Offset value offset, Of the current branch instruction PC value ( Or the next instruction of the branch instruction PC value ) Add this offset value to get the target address of the branch instruction . Due to the instruction length 32 position , It limits the size range of the immediate , So this type of branch instruction , Its jump range is generally small . For example, in pipeline decoding decode Stage can separate the immediate numbers in the instruction , Then calculate the target address of the branch instruction . Because the immediate number carried is generally unchanged , So this type is easy for branch prediction , Many processor manuals recommend that you try to use this type of branch instruction , To improve the accuracy of branch prediction , So as to improve the performance of the processor .
b:absolute, Also known as indirect jump indirect. The destination address of the branch instruction comes from the value of a general-purpose register , The number of this register is given by the instruction , Its destination address is 32 The value of a , So you can jump anywhere in the middle of the processor program . however , The value of this general-purpose register usually comes from the result of other instructions , So for branch instructions , It may take some time to get the destination address , For example, you need to wait until the execution of the pipeline execute Stage , During this period of time, the instructions entering the pipeline may be incorrect , This increases the penalty for branch prediction failure misprediction penalty. Most of the indirect jump branch instructions in the program are used to call subroutines CALL/Return, This type of instruction has a strong regularity , Is easy to predict .
Because the branch instruction has two aspects of direction and target address . For direction prediction , You need to predict whether this instruction will jump , For the prediction of the target address , You need to predict the target address of this branch instruction when a jump occurs . For ordinary processors , Because its pipeline depth is not deep , Generally, static branch prediction is used , Predictive branch instructions are always not executed , The processor always fetches instructions sequentially , In the subsequent stage of the assembly line , For example, the execution phase , Get the actual direction and target address of the branch instruction , Then judge . If the branch instruction needs to jump , Discard all instructions that enter the pipeline after the branch instruction ; If you don't jump , Then continue to take instructions sequentially to execute , It's like this branch instruction has never happened .
For a simple processor with a short pipeline , Failure of branch prediction does not cause too many instructions in the pipeline to be discarded .
and ,MIPS processor wield Reduce the waste of one cycle , An unrelated instruction will be placed after the branch instruction , This instruction will always be executed , Regardless of whether the branch instruction has a jump , The position after the branch instruction is MIPS It is called branch delay slot branch delay slot.
According to the actual execution of the processor , Dynamically predict branch instructions , This is dynamic branch prediction . Dynamic branch prediction requires expensive hardware resources .
To do branch prediction , First of all, we need to know from I-Cache Of the instructions taken out , That instruction is a branch instruction , This is true for superscalar processors that fetch multiple instructions per cycle , It's not easy .
The easiest way to think of is to change the instructions in the instruction group from I-Cache After taking it out , Fast decoding , It is called "fast" , Because it only needs to distinguish whether the decoded instruction is a branch instruction , Then, the branch instruction corresponding to PC Value is sent to the branch predictor branch predictor.
Instruction fast decoding fast decaode In the same cycle as branch prediction , It seriously affects the cycle time of the processor . To solve this problem , It can be ordered from L2 Cache Write to I-Cache Before fast decoding , It is called pre decoding pre-decode, Then the information about whether the instruction is a branch instruction is written together with the instruction I-Cache in , This will make I-Cache Occupy more area , But the fast decoding circuit can be omitted , Mitigate the impact on processor cycle time to some extent . however , The interval between the two stages is still too long , Can't be solved .
In the assembly line , The branch prediction is the higher the better , If the instruction is from I-Cache Branch prediction is performed only after it is taken out , When the prediction results are obtained , There have been many subsequent instructions coming into the pipeline , When the predicted value is to jump , These instructions need to be extracted from the pipeline flush, It reduces the execution efficiency of the processor . Therefore, the best time for branch prediction is to get the instruction address according to the current cycle , Branch prediction while fetching instructions , In this way, the next cycle can continue to take instructions according to the predicted results .
For an instruction , Its physical address will change ( It depends on where the operating system places it in physical memory ), And its virtual address is , That is to say PC value , It won't change .
After process switching , You need to clear the contents of the branch predictor , This ensures that branch predictions between different processes do not interfere with each other . If used ASID, Then you can compare it with PC Value together for branch prediction , There is no need to clear the branch predictor during process switching .
according to PC Value to predict whether there are branch instructions in the instruction group of this cycle , And the direction and destination address of the branch instruction , The process is as follows :
Branch prediction based on the address of the instruction is based on , Because once the program starts executing , The instruction fetch address corresponding to each instruction is fixed , Therefore, it can be completely according to the PC Value to determine whether this instruction is a branch instruction . As long as this branch instruction is executed for the first time , When I encounter this again PC value , You can know that the current instruction to be fetched is a branch instruction .
Branch prediction itself is complicated , Different processors have different implementation methods , It is one of the key factors that affect the performance of a processor , In order to consume in hardware 、 Prediction accuracy and delay latency Find a balance between .
边栏推荐
- 在中金证券上做基金定投安全吗?
- Get the list of a column in phpexcel get the letters of a column
- Meituan P4 carefully collated microservice system architecture design manual to see the world of microservice architecture
- Tearful eyes, it's not easy to change jobs. Three rounds of interviews, four hours of soul torture
- The latest masterpiece of Alibaba, which took 182 days to produce 1015 pages of distributed full stack manual, is so delicious
- Wechat applet WebView prohibits page scrolling without affecting the implementation of overflow scrolling in the business
- The market is relatively weak recently
- 硬件中台项目
- HMS core audio editing service 3D audio technology helps create an immersive auditory feast
- 关于OpenCV中图像的widthStep
猜你喜欢
![Clickhouse: Test on query speed of A-share minute data [Part 2]](/img/c8/42ba748d38546d3b0d2be9b33c5d0b.jpg)
Clickhouse: Test on query speed of A-share minute data [Part 2]

Huawei accounts work together at multiple ends to create a better internet life

Hololens2 development -6-eyetracking and speech recognition

新数据库时代,不要只学 Oracle、MySQL

Some tools used in embedded development

HMS Core音频编辑服务3D音频技术,助力打造沉浸式听觉盛宴

HMS core audio editing service 3D audio technology helps create an immersive auditory feast

Import and export of power platform platform sharepointlist

Mikrotik Routeros Internet access settings

那个程序员,被打了。
随机推荐
Introduction to mt7628k eCos development
JS prototype trap
Eat a rich woman's melon...
全球基金和资管的股票建仓率达到15年内新低
直播管理项目
电脑USB、HDMI、DP各种接口及速度
Apple amplification! It's done so well
Solution of EPS image blur by latex insertion
Postgraduate entrance examination vocabulary 2023 sharing (1)
那个程序员,被打了。
《天天数学》连载55:二月二十四日
How to understand JS promise
Win11账号被锁定无法登录怎么办?Win11账号被锁定无法登录
SQL learning notes (04) - data update and query operations
苹果放大招!这件事干的太漂亮了……
Hardware midrange project
The market is relatively weak recently
A 419 error occurred in the laravel postman submission form. July 6th, 2020 diary.
Tearful eyes, it's not easy to change jobs. Three rounds of interviews, four hours of soul torture
High precision factorial