当前位置：网站首页>We should make clear the branch prediction

We should make clear the branch prediction

2022-07-02 01:13:00 【Basketball fans who write code】

The English name of branch prediction is 「Branch Prediction」

You can Google Search for this keyword on , You can see a lot about branch prediction , But find out how branch prediction works , That's the point .

The impact of branch prediction on the program

Let's take a look at the following two pieces of code

Code 1

#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];
    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;
    // !!! With this, the next loop runs faster.
    //std::sort(data, data + arraySize);
    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i) {
        for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
            if (data[c] >= 128) sum += data[c];
        }
    }
    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

Execution results

@ubuntu:/data/study$ g++ fenzhi.cpp && ./a.out
21.6046
sum = 314931600000

Code 2

#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];
    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;
    // !!! With this, the next loop runs faster.
    std::sort(data, data + arraySize);
    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i) {
        for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
            if (data[c] >= 128) sum += data[c];
        }
    }
    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

Execution results ：

@ubuntu:/data/study$ g++ fenzhi.cpp && ./a.out
8.52157
sum = 314931600000

After the first code generates a random array , No sorting , The second code sorts random arrays , There is a great difference in the execution time .

therefore , What happened to them ？

The reasons for their different results , Namely Branch prediction , Branch prediction is CPU A prediction of a program by a processor , and CPU Architecture related , Many processors now have branch prediction function .

CPU While executing this code

if (data[c] >= 128) sum += data[c];

CPU There will be a prediction mechanism in advance , For example, the previous execution results are true, Then next time, judge if When , It will default to be true To deal with it , Let the following instructions enter the pre installation in advance .

Of course , This judgment will not affect the actual result output , This judgment is only for CPU Execute code in parallel .

CPU The execution of an instruction is divided into several stages

Since it is implemented in stages , That is what we normally say pipeline( Pipeline execution ).

The assembly line workers only need to complete the content they are responsible for , There is no need to care about other people to deal with .

If I have a piece of code , as follows ：

int a = 0;
a += 1;
a += 2;
a += 3;

From this picture we can see , We think it's implementation a = 0 After the end , Will execute a+=1.

But the actual CPU It's execution a=0 After the implementation of the first article of , Go ahead immediately a+=1 The first instruction of .

That's why , The execution speed has been greatly improved .

But for if() Language , When there is no branch prediction , We need to wait if() The next code can only be executed after the execution results appear .

If there is branch prediction

By comparison, we can find that , If there is branch prediction , It makes the execution speed faster .

So if you predict Failure , Will it affect the execution time , The answer is yes .

In the previous example , Without sorting the array , Most branch predictions will fail , At this time, the instruction will be retrieved and executed again after the execution , It will seriously affect the execution efficiency .

And in the example after sorting , Branch prediction has been in a successful state ,CPU The execution speed of has been greatly improved .

If the performance degradation caused by branch prediction is solved

There must be a certain decrease in branch prediction , The way to improve performance is not to use this damn if sentence .

such as , The above code , We can change it to this

#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];
    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;
    // !!! With this, the next loop runs faster.
    //std::sort(data, data + arraySize);
    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i) {
        for (unsigned c = 0; c < arraySize; ++c) { // Primary loop
            int t = (data[c] - 128) >> 31;
            sum += ~t & data[c];
        }
    }
    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

such as , The absolute value code we see , It also uses such ideas

/**
 * abs - return absolute value of an argument
 * @x: the value. If it is unsigned type, it is converted to signed type first.
 * char is treated as if it was signed (regardless of whether it really is)
 * but the macro's return type is preserved as char.
 *
 * Return: an absolute value of x.
 */
#define abs(x) __abs_choose_expr(x, long long, \
    __abs_choose_expr(x, long, \
    __abs_choose_expr(x, int, \
    __abs_choose_expr(x, short, \
    __abs_choose_expr(x, char, \
    __builtin_choose_expr( \
      __builtin_types_compatible_p(typeof(x), char), \
      (char)({ signed char __x = (x); __x<0?-__x:__x; }), \
      ((void)0)))))))

#define __abs_choose_expr(x, type, other) __builtin_choose_expr( \
  __builtin_types_compatible_p(typeof(x), signed type) || \
  __builtin_types_compatible_p(typeof(x), unsigned type), \
  ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other)

Of course , You can write like this

int abs(int i){
   if(i<0)
    return ~(--i);
  return i;
}

So , At the end of the computer is mathematics

Reference resources ：

https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array/11227902#11227902

https://blog.csdn.net/loongshawn/article/details/118339009

https://blog.csdn.net/DBC_121/article/details/105360658

原网站

版权声明
本文为[Basketball fans who write code]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202152155074397.html

当前位置：网站首页>We should make clear the branch prediction

We should make clear the branch prediction

边栏推荐

猜你喜欢

随机推荐