当前位置：网站首页>Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1

Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1

2022-07-01 09:53:00 【Qi Qi】

4.1 summary

A branch predictor with high accuracy branch predictor It is the key component to improve processor performance . But in the real world , High accuracy means more complex algorithms , That is, it takes up more silicon area and consumes more power , meanwhile , It also affects the cycle time of the processor . Even more unfortunate , Different programs have different characteristics , Therefore, it is difficult to find a universally applicable branch prediction algorithm .

If it can be in the instruction fetching stage , Can “ Foreknowledge ” Whether the instruction extracted in this cycle has branch instructions , And you can know its direction ( Jump or not jump ), And destination address (target address) Words , Then you can get the instruction from the target address of the branch instruction in the next cycle , This will not affect the assembly line , It avoids doing useless work .

In this way, you don't have to wait until the result of the branch instruction is really calculated , The process of predicting the result in advance is branch prediction .

Branch prediction is possible , Is determined by the characteristics of the branch instruction ,

Branch instructions contain two elements ：

（1） Direction , For a branch instruction , There are only two possible directions , One is a jump (taken), The other is not to jump (not token). Some branch instructions are executed unconditionally , for example jump Instructions

（2） Destination address , If the direction of the branch instruction jumps , You need to know where it jumps , The destination address is also carried in the instruction , about RISC For the instruction set , The target address can exist in two forms in the instruction ：

a:PC relative, Also known as a direct jump (direct). Give a relative in the form of an immediate number directly in the instruction PC Offset value offset, Of the current branch instruction PC value （ Or the next instruction of the branch instruction PC value ） Add this offset value to get the target address of the branch instruction . Due to the instruction length 32 position , It limits the size range of the immediate , So this type of branch instruction , Its jump range is generally small . For example, in pipeline decoding decode Stage can separate the immediate numbers in the instruction , Then calculate the target address of the branch instruction . Because the immediate number carried is generally unchanged , So this type is easy for branch prediction , Many processor manuals recommend that you try to use this type of branch instruction , To improve the accuracy of branch prediction , So as to improve the performance of the processor .

b：absolute, Also known as indirect jump indirect. The destination address of the branch instruction comes from the value of a general-purpose register , The number of this register is given by the instruction , Its destination address is 32 The value of a , So you can jump anywhere in the middle of the processor program . however , The value of this general-purpose register usually comes from the result of other instructions , So for branch instructions , It may take some time to get the destination address , For example, you need to wait until the execution of the pipeline execute Stage , During this period of time, the instructions entering the pipeline may be incorrect , This increases the penalty for branch prediction failure misprediction penalty. Most of the indirect jump branch instructions in the program are used to call subroutines CALL/Return, This type of instruction has a strong regularity , Is easy to predict .

Because the branch instruction has two aspects of direction and target address . For direction prediction , You need to predict whether this instruction will jump , For the prediction of the target address , You need to predict the target address of this branch instruction when a jump occurs . For ordinary processors , Because its pipeline depth is not deep , Generally, static branch prediction is used , Predictive branch instructions are always not executed , The processor always fetches instructions sequentially , In the subsequent stage of the assembly line , For example, the execution phase , Get the actual direction and target address of the branch instruction , Then judge . If the branch instruction needs to jump , Discard all instructions that enter the pipeline after the branch instruction ; If you don't jump , Then continue to take instructions sequentially to execute , It's like this branch instruction has never happened .

For a simple processor with a short pipeline , Failure of branch prediction does not cause too many instructions in the pipeline to be discarded .

and ,MIPS processor wield Reduce the waste of one cycle , An unrelated instruction will be placed after the branch instruction , This instruction will always be executed , Regardless of whether the branch instruction has a jump , The position after the branch instruction is MIPS It is called branch delay slot branch delay slot.

According to the actual execution of the processor , Dynamically predict branch instructions , This is dynamic branch prediction . Dynamic branch prediction requires expensive hardware resources .

To do branch prediction , First of all, we need to know from I-Cache Of the instructions taken out , That instruction is a branch instruction , This is true for superscalar processors that fetch multiple instructions per cycle , It's not easy .

The easiest way to think of is to change the instructions in the instruction group from I-Cache After taking it out , Fast decoding , It is called "fast" , Because it only needs to distinguish whether the decoded instruction is a branch instruction , Then, the branch instruction corresponding to PC Value is sent to the branch predictor branch predictor.

Instruction fast decoding fast decaode In the same cycle as branch prediction , It seriously affects the cycle time of the processor . To solve this problem , It can be ordered from L2 Cache Write to I-Cache Before fast decoding , It is called pre decoding pre-decode, Then the information about whether the instruction is a branch instruction is written together with the instruction I-Cache in , This will make I-Cache Occupy more area , But the fast decoding circuit can be omitted , Mitigate the impact on processor cycle time to some extent . however , The interval between the two stages is still too long , Can't be solved .

In the assembly line , The branch prediction is the higher the better , If the instruction is from I-Cache Branch prediction is performed only after it is taken out , When the prediction results are obtained , There have been many subsequent instructions coming into the pipeline , When the predicted value is to jump , These instructions need to be extracted from the pipeline flush, It reduces the execution efficiency of the processor . Therefore, the best time for branch prediction is to get the instruction address according to the current cycle , Branch prediction while fetching instructions , In this way, the next cycle can continue to take instructions according to the predicted results .

For an instruction , Its physical address will change ( It depends on where the operating system places it in physical memory ), And its virtual address is , That is to say PC value , It won't change .

After process switching , You need to clear the contents of the branch predictor , This ensures that branch predictions between different processes do not interfere with each other . If used ASID, Then you can compare it with PC Value together for branch prediction , There is no need to clear the branch predictor during process switching .

according to PC Value to predict whether there are branch instructions in the instruction group of this cycle , And the direction and destination address of the branch instruction , The process is as follows ：

Branch prediction based on the address of the instruction is based on , Because once the program starts executing , The instruction fetch address corresponding to each instruction is fixed , Therefore, it can be completely according to the PC Value to determine whether this instruction is a branch instruction . As long as this branch instruction is executed for the first time , When I encounter this again PC value , You can know that the current instruction to be fetched is a branch instruction .

Branch prediction itself is complicated , Different processors have different implementation methods , It is one of the key factors that affect the performance of a processor , In order to consume in hardware 、 Prediction accuracy and delay latency Find a balance between .

原网站

版权声明
本文为[Qi Qi]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207010938280424.html

当前位置：网站首页>Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1

Superscalar processor design yaoyongbin Chapter 4 branch prediction -- Excerpt from subsection 4.1

4.1 summary

边栏推荐

猜你喜欢

随机推荐