当前位置:网站首页>We should make clear the branch prediction
We should make clear the branch prediction
2022-07-02 01:13:00 【Basketball fans who write code】
The English name of branch prediction is 「Branch Prediction」
You can Google Search for this keyword on , You can see a lot about branch prediction , But find out how branch prediction works , That's the point .

The impact of branch prediction on the program
Let's take a look at the following two pieces of code
Code 1
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];
for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;
// !!! With this, the next loop runs faster.
//std::sort(data, data + arraySize);
// Test
clock_t start = clock();
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i) {
for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
if (data[c] >= 128) sum += data[c];
}
}
double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << '\n';
std::cout << "sum = " << sum << '\n';
}Execution results
@ubuntu:/data/study$ g++ fenzhi.cpp && ./a.out
21.6046
sum = 314931600000 Code 2
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];
for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;
// !!! With this, the next loop runs faster.
std::sort(data, data + arraySize);
// Test
clock_t start = clock();
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i) {
for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
if (data[c] >= 128) sum += data[c];
}
}
double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << '\n';
std::cout << "sum = " << sum << '\n';
}Execution results :
@ubuntu:/data/study$ g++ fenzhi.cpp && ./a.out
8.52157
sum = 314931600000After the first code generates a random array , No sorting , The second code sorts random arrays , There is a great difference in the execution time .

therefore , What happened to them ?
The reasons for their different results , Namely Branch prediction , Branch prediction is CPU A prediction of a program by a processor , and CPU Architecture related , Many processors now have branch prediction function .
CPU While executing this code
if (data[c] >= 128) sum += data[c];CPU There will be a prediction mechanism in advance , For example, the previous execution results are true, Then next time, judge if When , It will default to be true To deal with it , Let the following instructions enter the pre installation in advance .
Of course , This judgment will not affect the actual result output , This judgment is only for CPU Execute code in parallel .
CPU The execution of an instruction is divided into several stages

Since it is implemented in stages , That is what we normally say pipeline( Pipeline execution ).
The assembly line workers only need to complete the content they are responsible for , There is no need to care about other people to deal with .
If I have a piece of code , as follows :
int a = 0;
a += 1;
a += 2;
a += 3;

From this picture we can see , We think it's implementation a = 0 After the end , Will execute a+=1.
But the actual CPU It's execution a=0 After the implementation of the first article of , Go ahead immediately a+=1 The first instruction of .
That's why , The execution speed has been greatly improved .
But for if() Language , When there is no branch prediction , We need to wait if() The next code can only be executed after the execution results appear .

If there is branch prediction

By comparison, we can find that , If there is branch prediction , It makes the execution speed faster .

So if you predict Failure , Will it affect the execution time , The answer is yes .
In the previous example , Without sorting the array , Most branch predictions will fail , At this time, the instruction will be retrieved and executed again after the execution , It will seriously affect the execution efficiency .
And in the example after sorting , Branch prediction has been in a successful state ,CPU The execution speed of has been greatly improved .

If the performance degradation caused by branch prediction is solved
There must be a certain decrease in branch prediction , The way to improve performance is not to use this damn if sentence .
such as , The above code , We can change it to this
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];
for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;
// !!! With this, the next loop runs faster.
//std::sort(data, data + arraySize);
// Test
clock_t start = clock();
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i) {
for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
int t = (data[c] - 128) >> 31;
sum += ~t & data[c];
}
}
double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << '\n';
std::cout << "sum = " << sum << '\n';
} such as , The absolute value code we see , It also uses such ideas
/**
* abs - return absolute value of an argument
* @x: the value. If it is unsigned type, it is converted to signed type first.
* char is treated as if it was signed (regardless of whether it really is)
* but the macro's return type is preserved as char.
*
* Return: an absolute value of x.
*/
#define abs(x) __abs_choose_expr(x, long long, \
__abs_choose_expr(x, long, \
__abs_choose_expr(x, int, \
__abs_choose_expr(x, short, \
__abs_choose_expr(x, char, \
__builtin_choose_expr( \
__builtin_types_compatible_p(typeof(x), char), \
(char)({ signed char __x = (x); __x<0?-__x:__x; }), \
((void)0)))))))
#define __abs_choose_expr(x, type, other) __builtin_choose_expr( \
__builtin_types_compatible_p(typeof(x), signed type) || \
__builtin_types_compatible_p(typeof(x), unsigned type), \
({ signed type __x = (x); __x < 0 ? -__x : __x; }), other)Of course , You can write like this
int abs(int i){
if(i<0)
return ~(--i);
return i;
} So , At the end of the computer is mathematics
Reference resources :
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array/11227902#11227902
https://blog.csdn.net/loongshawn/article/details/118339009
https://blog.csdn.net/DBC_121/article/details/105360658
边栏推荐
- 2022 low voltage electrician examination questions and answers
- [eight sorting ③] quick sorting (dynamic graph deduction Hoare method, digging method, front and back pointer method)
- [eight sorts ④] merge sort, sort not based on comparison (count sort, cardinal sort, bucket sort)
- Global and Chinese markets of beverage seasoning systems 2022-2028: Research Report on technology, participants, trends, market size and share
- A problem about function template specialization
- Mitsubishi PLC FX3U pulse axis jog function block (mc_jog)
- 2022 high altitude installation, maintenance and removal of test question simulation test platform operation
- [wechat authorized login] the small program developed by uniapp realizes the function of obtaining wechat authorized login
- What are the differences between software testers with a monthly salary of 7K and 25K? Leaders look up to you when they master it
- Datawhale 社区黑板报(第1期)
猜你喜欢

【八大排序④】归并排序、不基于比较的排序(计数排序、基数排序、桶排序)
![[wechat authorized login] the small program developed by uniapp realizes the function of obtaining wechat authorized login](/img/c1/23be4399119f42d85a7b86fc8a59fc.png)
[wechat authorized login] the small program developed by uniapp realizes the function of obtaining wechat authorized login

You probably haven't noticed the very important testing strategy in your work

Schrodinger's Japanese learning applet source code

【八大排序②】选择排序(选择排序,堆排序)

Slf4j print abnormal stack information

学习笔记2--高精度地图定义及价值

What skills does an excellent software tester need to master?

To meet the needs of consumers in technological upgrading, Angel water purifier's competitive way of "value war"

ACM教程 - 快速排序(常规 + 尾递归 + 随机基准数)
随机推荐
关于ASP.NET CORE使用DateTime日期类型参数的一个小细节
cookie、session、tooken
2022 safety officer-b certificate examination practice questions simulated examination platform operation
Xinniuniu blind box wechat applet source code_ Support flow realization, with complete material pictures
Synthetic watermelon game wechat applet source code / wechat game applet source code
学习笔记3--高精度地图关键技术(上)
Zak's latest "neural information transmission", with slides and videos
Cookie, session, tooken
JMeter做接口测试,如何提取登录Cookie
Datawhale community blackboard newspaper (issue 1)
[conference resources] the Third International Conference on Automation Science and Engineering in 2022 (jcase 2022)
Global and Chinese markets for context and location-based services 2022-2028: Research Report on technology, participants, trends, market size and share
Promise and modular programming
How does schedulerx help users solve the problem of distributed task scheduling?
What skills does an excellent software tester need to master?
Han Zhichao: real time risk control practice of eBay based on graph neural network
【疾病检测】基于BP神经网络实现肺癌检测系统含GUI界面
cookie、session、tooken
Datawhale 社区黑板报(第1期)
首场“移动云杯”空宣会,期待与开发者一起共创算网新世界!