当前位置:网站首页>Compilation optimization of performance optimization
Compilation optimization of performance optimization
2022-06-12 17:07:00 【Tianya road Linux】
This article mainly shares and summarizes the performance optimization methods used in the process of compiling optimization .
1. Feedback compilation PGO
This article uses bubble sorting as an example to introduce PGO Use , The bubble sort code is as follows :
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
void bubble_sort(std::vector<int> &nums)
{
int n = nums.size();
for (int i = 0; i < n; ++i) {
for (int j = i; j < n; ++j) {
if (nums[j] < nums[i]) {
std::swap(nums[i], nums[j]);
}
}
}
}
int main()
{
srand(time(nullptr));
int n = 30000;
std::vector<int> nums(30000);
for (int i = 0; i < 30000; ++i) {
nums[i] = rand();
}
bubble_sort(nums);
return 0;
} Use at compile time -fprofile-generate, Run the program for training , Generate profile(.gcda file )

notes : Training is usually carried out in a near real production and operation environment , After a certain training time , Manual call required call (void)__gcov_flush() function , Otherwise, it will not generate .gcda file ( Normally, the program needs to be terminated to generate ).
Use profile(*.gcda file ) Recompile
-fprofile-use

Disassembly :

Original function disassembly :
0000000000400740 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400740: 4c 8b 07 mov (%rdi),%r8
400743: 48 8b 47 08 mov 0x8(%rdi),%rax
400747: 4c 29 c0 sub %r8,%rax
40074a: 48 c1 f8 02 sar $0x2,%rax
40074e: 89 c7 mov %eax,%edi
400750: 85 c0 test %eax,%eax
400752: 7e 44 jle 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400754: 4c 89 c6 mov %r8,%rsi
400757: 44 8d 50 ff lea -0x1(%rax),%r10d
40075b: 45 31 c9 xor %r9d,%r9d
40075e: 66 90 xchg %ax,%ax
400760: 4c 89 c8 mov %r9,%rax
400763: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400768: 41 8b 0c 80 mov (%r8,%rax,4),%ecx
40076c: 8b 16 mov (%rsi),%edx
40076e: 39 d1 cmp %edx,%ecx
400770: 7d 06 jge 400778 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
400772: 89 0e mov %ecx,(%rsi)
400774: 41 89 14 80 mov %edx,(%r8,%rax,4)
400778: 48 83 c0 01 add $0x1,%rax
40077c: 39 c7 cmp %eax,%edi
40077e: 7f e8 jg 400768 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x28>
400780: 49 8d 41 01 lea 0x1(%r9),%rax
400784: 48 83 c6 04 add $0x4,%rsi
400788: 4d 39 d1 cmp %r10,%r9
40078b: 74 0b je 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
40078d: 49 89 c1 mov %rax,%r9
400790: eb ce jmp 400760 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x20>
400792: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
400798: c3 retq
400799: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)PGO Disassembly after optimization :
0000000000400650 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400650: 48 8b 37 mov (%rdi),%rsi
400653: 48 8b 47 08 mov 0x8(%rdi),%rax
400657: 45 31 c9 xor %r9d,%r9d
40065a: 48 29 f0 sub %rsi,%rax
40065d: 4c 8d 56 04 lea 0x4(%rsi),%r10
400661: 48 c1 f8 02 sar $0x2,%rax
400665: 41 89 c3 mov %eax,%r11d
400668: 44 8d 40 ff lea -0x1(%rax),%r8d
40066c: 45 39 cb cmp %r9d,%r11d
40066f: 7e 42 jle 4006b3 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x63>
400671: 44 89 c2 mov %r8d,%edx
400674: 48 89 f0 mov %rsi,%rax
400677: 44 29 ca sub %r9d,%edx
40067a: 4c 01 ca add %r9,%rdx
40067d: 49 8d 3c 92 lea (%r10,%rdx,4),%rdi
400681: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400688: 8b 0e mov (%rsi),%ecx
40068a: 8b 10 mov (%rax),%edx
40068c: 39 ca cmp %ecx,%edx
40068e: 7d 18 jge 4006a8 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400690: 89 16 mov %edx,(%rsi)
400692: 48 83 c0 04 add $0x4,%rax
400696: 89 48 fc mov %ecx,-0x4(%rax)
400699: 48 39 f8 cmp %rdi,%rax
40069c: 75 ea jne 400688 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
40069e: 49 83 c1 01 add $0x1,%r9
4006a2: 48 83 c6 04 add $0x4,%rsi
4006a6: eb c4 jmp 40066c <_Z11bubble_sortRSt6vectorIiSaIiEE+0x1c>
4006a8: 48 83 c0 04 add $0x4,%rax
4006ac: 48 39 f8 cmp %rdi,%rax
4006af: 75 d9 jne 40068a <_Z11bubble_sortRSt6vectorIiSaIiEE+0x3a>
4006b1: eb eb jmp 40069e <_Z11bubble_sortRSt6vectorIiSaIiEE+0x4e>
4006b3: c3 retq Disadvantages of this approach :
1. Training data required . How to ensure that the training data matches the actual production environment
2. Configuration file generation costs a lot
The second method AutoFDO
1. Use perf record -b
2. Use autofdo tool(available on github)
create_gcov --binary=xxx --profile=perf.data --gcov=xxx.gcov --gcov-version=1
3.gcc xxx.cpp -g -O2 -fauto-profile=xxx.gcov -o xxx
2. Compilation options optimization
Use compilation options that optimize performance , This article mainly introduces the inline Optimize , In the actual production environment, the profit is considerable .
1. Add... To functions that are only called in this file static Prefix , open -finlie-functions-called-once Options
2.-finline-function
3.f-inline-small-function
This paper is written by lxx A literary creation , use Creative Commons signature 3.0, Reprint freely 、 quote , But you need to sign the author and indicate the source of the article .
Compilation optimization of performance optimization - coderocku
边栏推荐
- Cloud development kunkun chicken music box wechat applet source code
- 软件工程 学生信息管理系统 结构化的需求分析
- MySQL transaction introduction and transaction isolation level
- R语言使用epiDisplay包的tableStack函数基于分组变量生成统计分析表(包含描述性统计分析、假设检验、不同数据使用不同的统计量和假设检验方法)、自定义配置是否显示统计检验内容
- What is compound interest financial product?
- Pat class a 1139 first contact
- Golang recursively encrypts and decrypts all files under the specified folder
- 薛定谔的日语学习小程序源码
- idea如何设置导包不带*号
- \Begin{algorithm} notes
猜你喜欢

Download PHP source code of leaf sharing station

CVPR 2022 | meta learning performance in image regression task

Fiddler抓包几种常用功能介绍(停止抓包、清空会话窗内容、过滤请求、解码、设置断点......)

Atlas conflict Remote Code Execution Vulnerability (cve-2022-26134) vulnerability recurrence

Nebula's practice of intelligent risk control in akulaku: training and deployment of graph model

Detailed explanation of shardingjdbc database and table

Installation and use of rolabelimg

借助SpotBugs将程序错误扼杀在摇篮中

goland变成中文版了怎么修改回英文版

使用ubantu时,遇见的一些小毛病和解决方法
随机推荐
R语言计算data.table在一个分组变量的值固定的情况下另外一个分组变量下指定数值变量的均值
Difference between big end mode and small end mode
MySQL事务简介、事务隔离级别
Selenium element positioning
叶子分享站PHP源码下载
The R language uses the tablestack function of epidisplay package to generate statistical analysis tables based on grouped variables (including descriptive statistical analysis, hypothesis test, diffe
Qiushengchang: Practice of oppo commercial data system construction
redis. clients. jedis. exceptions. JedisConnectionException: Could not get a resource from the pool
Loading shellcode in C and go languages
Iscc-2022 part WP
Quick start sweep crawler framework
\Begin{algorithm} notes
2022-2028 global press dehydrator industry research and trend analysis report
Preprocessing command section 3
[MySQL] internal connection, external connection and self connection (detailed explanation)
Analysis of CA certificate with high value
R语言使用epiDisplay包的pyramid函数可视化金字塔图、基于已有的汇总数据(表格数据)可视化金字塔图
STL——函数对象
js 使用Rsa 加密 解密
Play kubernetes every 5 minutes summary