当前位置:网站首页>性能优化之编译优化
性能优化之编译优化
2022-06-12 16:43:00 【天涯路linux】
本文主要分享总结一下工作过程中用到有关于编译优化方面的性能优化手段。
1.反馈式编译PGO
本文使用冒泡排序作为例子来介绍PGO的使用,冒泡排序代码如下:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
void bubble_sort(std::vector<int> &nums)
{
int n = nums.size();
for (int i = 0; i < n; ++i) {
for (int j = i; j < n; ++j) {
if (nums[j] < nums[i]) {
std::swap(nums[i], nums[j]);
}
}
}
}
int main()
{
srand(time(nullptr));
int n = 30000;
std::vector<int> nums(30000);
for (int i = 0; i < 30000; ++i) {
nums[i] = rand();
}
bubble_sort(nums);
return 0;
}编译时使用-fprofile-generate,运行程序进行训练,生成profile(.gcda文件)

注:通常是在接近真实生产运行环境中进行训练,达到一定训练时间后,需要手动调用call (void)__gcov_flush()函数,否则不会生成.gcda文件(正常需要程序终止才能生成)。
使用profile(*.gcda文件)进行再次编译
-fprofile-use

进行反汇编:

原函数反汇编:
0000000000400740 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400740: 4c 8b 07 mov (%rdi),%r8
400743: 48 8b 47 08 mov 0x8(%rdi),%rax
400747: 4c 29 c0 sub %r8,%rax
40074a: 48 c1 f8 02 sar $0x2,%rax
40074e: 89 c7 mov %eax,%edi
400750: 85 c0 test %eax,%eax
400752: 7e 44 jle 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400754: 4c 89 c6 mov %r8,%rsi
400757: 44 8d 50 ff lea -0x1(%rax),%r10d
40075b: 45 31 c9 xor %r9d,%r9d
40075e: 66 90 xchg %ax,%ax
400760: 4c 89 c8 mov %r9,%rax
400763: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400768: 41 8b 0c 80 mov (%r8,%rax,4),%ecx
40076c: 8b 16 mov (%rsi),%edx
40076e: 39 d1 cmp %edx,%ecx
400770: 7d 06 jge 400778 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
400772: 89 0e mov %ecx,(%rsi)
400774: 41 89 14 80 mov %edx,(%r8,%rax,4)
400778: 48 83 c0 01 add $0x1,%rax
40077c: 39 c7 cmp %eax,%edi
40077e: 7f e8 jg 400768 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x28>
400780: 49 8d 41 01 lea 0x1(%r9),%rax
400784: 48 83 c6 04 add $0x4,%rsi
400788: 4d 39 d1 cmp %r10,%r9
40078b: 74 0b je 400798 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
40078d: 49 89 c1 mov %rax,%r9
400790: eb ce jmp 400760 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x20>
400792: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
400798: c3 retq
400799: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)PGO优化后反汇编:
0000000000400650 <_Z11bubble_sortRSt6vectorIiSaIiEE>:
400650: 48 8b 37 mov (%rdi),%rsi
400653: 48 8b 47 08 mov 0x8(%rdi),%rax
400657: 45 31 c9 xor %r9d,%r9d
40065a: 48 29 f0 sub %rsi,%rax
40065d: 4c 8d 56 04 lea 0x4(%rsi),%r10
400661: 48 c1 f8 02 sar $0x2,%rax
400665: 41 89 c3 mov %eax,%r11d
400668: 44 8d 40 ff lea -0x1(%rax),%r8d
40066c: 45 39 cb cmp %r9d,%r11d
40066f: 7e 42 jle 4006b3 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x63>
400671: 44 89 c2 mov %r8d,%edx
400674: 48 89 f0 mov %rsi,%rax
400677: 44 29 ca sub %r9d,%edx
40067a: 4c 01 ca add %r9,%rdx
40067d: 49 8d 3c 92 lea (%r10,%rdx,4),%rdi
400681: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400688: 8b 0e mov (%rsi),%ecx
40068a: 8b 10 mov (%rax),%edx
40068c: 39 ca cmp %ecx,%edx
40068e: 7d 18 jge 4006a8 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x58>
400690: 89 16 mov %edx,(%rsi)
400692: 48 83 c0 04 add $0x4,%rax
400696: 89 48 fc mov %ecx,-0x4(%rax)
400699: 48 39 f8 cmp %rdi,%rax
40069c: 75 ea jne 400688 <_Z11bubble_sortRSt6vectorIiSaIiEE+0x38>
40069e: 49 83 c1 01 add $0x1,%r9
4006a2: 48 83 c6 04 add $0x4,%rsi
4006a6: eb c4 jmp 40066c <_Z11bubble_sortRSt6vectorIiSaIiEE+0x1c>
4006a8: 48 83 c0 04 add $0x4,%rax
4006ac: 48 39 f8 cmp %rdi,%rax
4006af: 75 d9 jne 40068a <_Z11bubble_sortRSt6vectorIiSaIiEE+0x3a>
4006b1: eb eb jmp 40069e <_Z11bubble_sortRSt6vectorIiSaIiEE+0x4e>
4006b3: c3 retq 这种方式的缺点:
1.需要训练数据。如何保证训练数据与实际生产环境是相匹配的
2.配置文件生成开销大
第二种方法AutoFDO
1.使用perf record -b
2.使用autofdo tool(available on github)
create_gcov --binary=xxx --profile=perf.data --gcov=xxx.gcov --gcov-version=1
3.gcc xxx.cpp -g -O2 -fauto-profile=xxx.gcov -o xxx
2.编译选项优化
使用可以优化性能的编译选项,本文主要介绍项目过程中使用到的inline优化,实际生产环境中收益比较可观。
1.给只在本文件中调用的函数添加static前缀,打开-finlie-functions-called-once选项
2.-finline-function
3.f-inline-small-function
本文由 lxx 创作,采用 知识共享署名 3.0,可自由转载、引用,但需署名作者且注明文章出处。
边栏推荐
- What is compound interest financial product?
- The C programming language (version 2) notes / 8 UNIX system interface / 8.7 instance (storage allocator)
- [Hunan University] information sharing of the first and second postgraduate entrance examinations
- Qcustomplot notes (I): qcustomplot adding data and curves
- redis. clients. jedis. exceptions. JedisDataException: NOAUTH Authentication required
- Structural requirement analysis of software engineering student information management system
- QCustomplot笔记(一)之QCustomplot添加数据以及曲线
- 男神女神投票源码 v5.5.21 投票源码
- Project training of Shandong University rendering engine system (II)
- 程序员爆料:4年3次跳槽,薪资翻了3倍!网友:拳头硬了......
猜你喜欢
![[research] reading English papers -- the welfare of researchers in English poor](/img/8a/671e6cb6a3f4e3b84ea0795dc5a365.png)
[research] reading English papers -- the welfare of researchers in English poor

JVM memory model and local memory

使用ubantu时,遇见的一些小毛病和解决方法

Unit sshd. service could not be found

Anyone who watches "Meng Hua Lu" should try this Tiktok effect

Structural requirement analysis of software engineering student information management system

How to base on CCS_ V11 new tms320f28035 project

idea如何设置导包不带*号

【湖南大学】考研初试复试资料分享

Canvas advanced functions (Part 2)
随机推荐
[Hunan University] information sharing of the first and second postgraduate entrance examinations
(五)输出和输出
Project training of Shandong University rendering engine system (VI)
Possible problems of long jump in gaussdb
[research] reading English papers -- the welfare of researchers in English poor
从50亿图文中提取中文跨模态新基准Zero,奇虎360全新预训练框架超越多项SOTA
ISCC-2022 部分wp
key为断言的map是怎么玩的
云开发坤坤鸡乐盒微信小程序源码
Cookies and sessions
MySQL interview arrangement
MySQL statement
Exception assertion of assertj
Daily question -890 Find and replace mode
Doctor application | National University of Singapore, Xinchao Wang, teacher recruitment, doctor / postdoctoral candidate in the direction of graph neural network
Which colleges are particularly easy to enter?
Probation period and overtime compensation -- knowledge before and after entering the factory labor law
武汉大学甘菲课题组和南昌大学徐振江课题组联合招聘启事
Swin transformer code explanation
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool