当前位置:网站首页>Compilation optimization of performance optimization
Compilation optimization of performance optimization
2022-06-12 17:07:00 【Tianya road Linux】
This article mainly shares and summarizes the performance optimization methods used in the process of compiling optimization .
1. Feedback compilation PGO
This article uses bubble sorting as an example to introduce PGO Use , The bubble sort code is as follows :
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
void bubble_sort(std::vector<int> &nums)
{
int n = nums.size();
for (int i = 0; i < n; ++i) {
for (int j = i; j < n; ++j) {
if (nums[j] < nums[i]) {
std::swap(nums[i], nums[j]);
}
}
}
}
int main()
{
srand(time(nullptr));
int n = 30000;
std::vector<int> nums(30000);
for (int i = 0; i < 30000; ++i) {
nums[i] = rand();
}
bubble_sort(nums);
return 0;
} Use at compile time -fprofile-generate, Run the program for training , Generate profile(.gcda file )

notes : Training is usually carried out in a near real production and operation environment , After a certain training time , Manual call required call (void)__gcov_flush() function , Otherwise, it will not generate .gcda file ( Normally, the program needs to be terminated to generate ).
Use profile(*.gcda file ) Recompile
-fprofile-use

Disassembly :

Original function disassembly :
0000000000400740 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400740: 4c 8b 07 mov (%rdi),%r8
400743: 48 8b 47 08 mov 0x8(%rdi),%rax
400747: 4c 29 c0 sub %r8,%rax
40074a: 48 c1 f8 02 sar $0x2,%rax
40074e: 89 c7 mov %eax,%edi
400750: 85 c0 test %eax,%eax
400752: 7e 44 jle 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400754: 4c 89 c6 mov %r8,%rsi
400757: 44 8d 50 ff lea -0x1(%rax),%r10d
40075b: 45 31 c9 xor %r9d,%r9d
40075e: 66 90 xchg %ax,%ax
400760: 4c 89 c8 mov %r9,%rax
400763: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400768: 41 8b 0c 80 mov (%r8,%rax,4),%ecx
40076c: 8b 16 mov (%rsi),%edx
40076e: 39 d1 cmp %edx,%ecx
400770: 7d 06 jge 400778 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
400772: 89 0e mov %ecx,(%rsi)
400774: 41 89 14 80 mov %edx,(%r8,%rax,4)
400778: 48 83 c0 01 add $0x1,%rax
40077c: 39 c7 cmp %eax,%edi
40077e: 7f e8 jg 400768 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x28>
400780: 49 8d 41 01 lea 0x1(%r9),%rax
400784: 48 83 c6 04 add $0x4,%rsi
400788: 4d 39 d1 cmp %r10,%r9
40078b: 74 0b je 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
40078d: 49 89 c1 mov %rax,%r9
400790: eb ce jmp 400760 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x20>
400792: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
400798: c3 retq
400799: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)PGO Disassembly after optimization :
0000000000400650 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400650: 48 8b 37 mov (%rdi),%rsi
400653: 48 8b 47 08 mov 0x8(%rdi),%rax
400657: 45 31 c9 xor %r9d,%r9d
40065a: 48 29 f0 sub %rsi,%rax
40065d: 4c 8d 56 04 lea 0x4(%rsi),%r10
400661: 48 c1 f8 02 sar $0x2,%rax
400665: 41 89 c3 mov %eax,%r11d
400668: 44 8d 40 ff lea -0x1(%rax),%r8d
40066c: 45 39 cb cmp %r9d,%r11d
40066f: 7e 42 jle 4006b3 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x63>
400671: 44 89 c2 mov %r8d,%edx
400674: 48 89 f0 mov %rsi,%rax
400677: 44 29 ca sub %r9d,%edx
40067a: 4c 01 ca add %r9,%rdx
40067d: 49 8d 3c 92 lea (%r10,%rdx,4),%rdi
400681: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400688: 8b 0e mov (%rsi),%ecx
40068a: 8b 10 mov (%rax),%edx
40068c: 39 ca cmp %ecx,%edx
40068e: 7d 18 jge 4006a8 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400690: 89 16 mov %edx,(%rsi)
400692: 48 83 c0 04 add $0x4,%rax
400696: 89 48 fc mov %ecx,-0x4(%rax)
400699: 48 39 f8 cmp %rdi,%rax
40069c: 75 ea jne 400688 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
40069e: 49 83 c1 01 add $0x1,%r9
4006a2: 48 83 c6 04 add $0x4,%rsi
4006a6: eb c4 jmp 40066c <_Z11bubble_sortRSt6vectorIiSaIiEE+0x1c>
4006a8: 48 83 c0 04 add $0x4,%rax
4006ac: 48 39 f8 cmp %rdi,%rax
4006af: 75 d9 jne 40068a <_Z11bubble_sortRSt6vectorIiSaIiEE+0x3a>
4006b1: eb eb jmp 40069e <_Z11bubble_sortRSt6vectorIiSaIiEE+0x4e>
4006b3: c3 retq Disadvantages of this approach :
1. Training data required . How to ensure that the training data matches the actual production environment
2. Configuration file generation costs a lot
The second method AutoFDO
1. Use perf record -b
2. Use autofdo tool(available on github)
create_gcov --binary=xxx --profile=perf.data --gcov=xxx.gcov --gcov-version=1
3.gcc xxx.cpp -g -O2 -fauto-profile=xxx.gcov -o xxx
2. Compilation options optimization
Use compilation options that optimize performance , This article mainly introduces the inline Optimize , In the actual production environment, the profit is considerable .
1. Add... To functions that are only called in this file static Prefix , open -finlie-functions-called-once Options
2.-finline-function
3.f-inline-small-function
This paper is written by lxx A literary creation , use Creative Commons signature 3.0, Reprint freely 、 quote , But you need to sign the author and indicate the source of the article .
Compilation optimization of performance optimization - coderocku
边栏推荐
- Gerrit触发Jenkins SonarQube扫描
- Gerrit+2触发Jenkins任务
- 2022-2028 global press dehydrator industry research and trend analysis report
- 大端模式和小端模式的区别
- 两位新晋Committer的“升级攻略”
- Différence entre le mode grand et le mode petit
- Crazy temporary products: super low price, big scuffle and new hope
- 邱盛昌:OPPO商业化数据体系建设实战
- 初识GO语言
- Microsoft Office MSDT代码执行漏洞(CVE-2022-30190)漏洞复现
猜你喜欢

Uniapp壁纸小程序源码/双端微信抖音小程序源码

redis.clients.jedis.exceptions.JedisDataException: NOAUTH Authentication required

Cicada mother talks to rainbow couple: 1.3 billion goods a year, from e-commerce beginners to super goods anchor

Swin transformer code explanation

Uniapp wallpaper applet source code / double ended wechat Tiktok applet source code

Gerrit+2触发Jenkins任务

Gerrit触发Jenkins SonarQube扫描

Swin Transformer代码讲解

The safety of link 01 was questioned, and "ultra high strength" became "high strength"_ Publicity_ Steel_ problem

Qiushengchang: Practice of oppo commercial data system construction
随机推荐
Extract the new Chinese cross modal benchmark zero from 5billion pictures and texts, and Qihoo 360's new pre training framework surpasses many SOTAS
(七)循环语句for
MySQL statement
Cloud development kunkun chicken music box wechat applet source code
Su directly switches to super administrator mode, so that many error reports can be avoided
男神女神投票源码 v5.5.21 投票源码
(六)控制语句if/else switch
Some minor problems and solutions encountered when using ubantu
大端模式和小端模式的區別
R语言使用ggplot2可视化dataframe数据中特定数据列的密度图(曲线)、并使用xlim参数指定X轴的范围
Installation and use of rolabelimg
D. master router setting and 401 networking
The R language uses the tablestack function of epidisplay package to generate statistical analysis tables based on grouped variables (including descriptive statistical analysis, hypothesis test, diffe
goland变成中文版了怎么修改回英文版
Leetcode 2190. The number that appears most frequently in the array immediately after the key (yes, once)
\Begin{algorithm} notes
Male god goddess voting source code v5.5.21 voting source code
key为断言的map是怎么玩的
Introduction to several common functions of fiddler packet capturing (stop packet capturing, clear session window contents, filter requests, decode, set breakpoints...)
The R language uses the plot function to visualize the data scatter chart, and uses font The axis parameter specifies that the font type of the axis scale label is italic