当前位置:网站首页>Perf simple process for multithreaded profile
Perf simple process for multithreaded profile
2022-07-04 04:04:00 【BanFS】
Background knowledge
Perf It is a tool for software performance analysis , adopt Perf, Applications can take advantage of PMU,tracepoint And special counters in the kernel for performance statistics .Perf It can not only analyze the performance problems of applications (per thread), You can also analyze the performance problems of the kernel , Handle all performance related events : Hardware events during program operation , Such as instructions retired ,processor clock cycles etc. ; Software events , Such as Page Fault And process switching .
Perf The basic principle is to sample the monitored object , The simplest case is based on tick Interrupt sampling , That is to say tick Trigger sampling point in interrupt , Determine the current context of the program at the sampling point . Suppose a program 90% All the time spent on the function func1() On , that 90% All of the sampling points should fall in the function func1() In the context of , The longer the sampling time is , The more reliable the above inference is . Use perf Have administrator privileges
Use perf Multithreading profile
Prepare multithreaded program
In this program , Two threads were created . Ran different times func1() Method .gcc -lpthread main.c
#include <pthread.h>
#include <stdio.h>
#include <string.h>
pthread_t thread[2];
void func1() {
int i = 0;
while (i<10000)
++i;
}
void func2() {
int i = 0;
while (i<10000)
i = i*2;
func1();
}
void *thread1()
{
for (;;)
{
func1();
}
pthread_exit(NULL);
}
void *thread2()
{
for (;;)
{
func2();
}
pthread_exit(NULL);
}
void thread_create(void)
{
int temp;
memset(&thread, 0, sizeof(thread));
if((temp = pthread_create(&thread[0], NULL, thread1, NULL)) != 0)
printf(" Threads 1 Create failure !\n");
else
printf(" Threads 1 Be created \n");
if((temp = pthread_create(&thread[1], NULL, thread2, NULL)) != 0)
printf(" Threads 2 Create failure \n");
else
printf(" Threads 2 Be created \n");
}
int main()
{
thread_create();
pthread_join(thread[0],NULL);
printf(" Threads 1 Join in ");
pthread_join(thread[1],NULL);
printf(" Threads 2 Join in ");
return 0;
}
Use perf Sample the program
For the program that hasn't been started
[email protected]:/home/banfushen/perf_cpu/multi_thread# perf record -h
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
...
-F, --freq <n> profile at this frequency
-g enables call-graph recording
-p, --pid <pid> record events on existing process id
-t, --tid <tid> record events on existing thread id
...
perf record -g -F 99 ./a.out
, Sampling multithreaded programs , sampling frequency 99,(-F 99: sample at 99 Hertz (samples per second). I’ll sometimes sample faster than this (up to 999 Hertz), but that also costs overhead. 99 Hertz should be negligible. Also, the value ‘99’ and not ‘100’ is to avoid lockstep sampling, which can produce skewed results.). After running, you will get a perf.data
, To get the flame diagram , Other tools are needed .
For the program that has been started
For programs that have been started , To get pid,perf record -g -F 99 -p <pid>
Download tool FlameGraph
git clone https://github.com/brendangregg/FlameGraph.git
[email protected]:~/perf_cpu/FlameGraph$ pwd
/home/banfushen/perf_cpu/FlameGraph
Generate flame chart
Yes perf.data Generate flame chart ( According to the above, it is a process )
perf script |/home/banfushen/perf_cpu/FlameGraph/stackcollapse-perf.pl|/home/banfushen/perf_cpu/FlameGraph/flamegraph.pl > output.svg
Generate a flame diagram for a single thread
You know, threads id
[email protected]:/home/banfushen/perf_cpu/multi_thread# perf script -h
Usage: perf script [<options>]
or: perf script [<options>] record <script> [<record-options>] <command>
or: perf script [<options>] report <script> [script-args]
or: perf script [<options>] <script> [<record-options>] <command>
or: perf script [<options>] <top-script> [script-args]
...
-v, --verbose be more verbose (show symbol address, etc)
--pid <pid[,pid...]>
--tid <tid[,tid...]>
only consider symbols in these tids
perf script -v --tid <tid> Specified thread
perf script -v --tid 2283471|/home/banfushen/perf_cpu/FlameGraph/stackcollapse-perf.pl|/home/banfushen/perf_cpu/FlameGraph/flamegraph.pl > output1.svg
Generate a flame diagram for multiple threads
perf script -v --tid <tid[,tid...]> Specify multiple threads
perf script -v --tid 2283472,2283471|/home/banfushen/perf_cpu/FlameGraph/stackcollapse-perf.pl|/home/banfushen/perf_cpu/FlameGraph/flamegraph.pl > output3.svg
Reference material :
perf Examples
perf Performance analysis
Performance analysis tool perf elementary analysis
utilize perf analyse Linux Applications
Linux Performance analysis tool Perf brief introduction
边栏推荐
- MySQL one master multiple slaves + linear replication
- 深度优先搜索简要讲解(附带基础题)
- There is a problem that the package cannot be parsed in the like project
- 【华为云IoT】读书笔记之《万物互联:物联网核心技术与安全》第3章(上)
- National standard gb28181 protocol platform easygbs fails to start after replacing MySQL database. How to deal with it?
- Why is it recommended that technologists write blogs?
- 【罗技】m720
- 【webrtc】m98 ninja 构建和编译指令
- Objective-C member variable permissions
- ctf-pikachu-XSS
猜你喜欢
2022-07-03:数组里有0和1,一定要翻转一个区间,翻转:0变1,1变0。 请问翻转后可以使得1的个数最多是多少? 来自小红书。3.13笔试。
Summary of Chinese remainder theorem
Leetcode51.n queen
Balance between picture performance of unity mobile game performance optimization spectrum and GPU pressure
EV6 helps the product matrix, and Kia is making efforts in the high-end market. The global sales target in 2022 is 3.15 million?
laravel admin里百度编辑器自定义路径和文件名
How about the ratings of 2022 Spring Festival Gala in all provinces? Map analysis helps you show clearly!
The three-year revenue is 3.531 billion, and this Jiangxi old watch is going to IPO
Infiltration practice guest account mimikatz sunflower SQL rights lifting offline decryption
JVM family -- monitoring tools
随机推荐
[Huawei cloud IOT] reading notes, "Internet of things: core technology and security of the Internet of things", Chapter 3 (I)
Which product is better for 2022 annual gold insurance?
Package details_ Four access control characters_ Two details of protected
A review of reverse reinforcement learning at Virginia Tech (VT)
MySQL is dirty
JSON string conversion in unity
Typical applications of minimum spanning tree
CesiumJS 2022^ 源码解读[0] - 文章目录与源码工程结构
Calculate the odd sum of 1~n (1~100 as an example)
vim正确加区间注释
Balance between picture performance of unity mobile game performance optimization spectrum and GPU pressure
Objective-C member variable permissions
[paddleseg source code reading] paddleseg custom data class
Day05 錶格
用于TCP协议交互的TCPClientDemo
Illustrated network: what is the hot backup router protocol HSRP?
Exercices de renforcement des déclarations SQL (MySQL 8.0 par exemple)
【webrtc】m98 ninja 构建和编译指令
[source code analysis] model parallel distributed training Megatron (5) -- pipestream flush
02 ls 命令的具体实现