当前位置:网站首页>Neon optimization 1: how to optimize software performance and reduce power consumption?
Neon optimization 1: how to optimize software performance and reduce power consumption?
2022-06-27 05:26:00 【To know】
NEON Optimize 1: Software performance optimization 、 How to reduce the power consumption of hardware ?
background
For mobile terminals or embedded devices and other scenarios, cutting-edge technologies can also be used , Products often have some complex algorithm models , But because of Algorithm overhead is too high , Resulting in poor real-time performance 、 High power consumption Other questions , Performance optimization at the end side is required .
How to do this without changing the effect of the algorithm , Reduce the time complexity of algorithm code , It has become a problem that many engineers have to face .
The basis of performance optimization MCPS and MIPS
First , Before performance optimization , A specific performance measure should be found , namely MIPS/MCPS.
- MIPS:million instructions per second, The number of instructions consumed per second when the program is running
- MCPS:million instructions per second, The number of cycles per second that the program is running
MIPS and MCPS The difference between
- MIPS Is the number of instructions , There is little difference between soft imitation and hard imitation on different platforms
- MCPS It's the number of cycles , Due to hardware optimization , There may be different platforms MCPS Different , Even better than MIPS Still small .
General soft imitation results ,MIPS All ratio MCPS Small , Because soft imitation tools RVDS Of CPI The minimum capacity is 1, Hard simulation results can be obtained directly MCPS Count . Hard imitation time , well CPU Can do CPI Less than 1, namely 1 Multiple cycle instructions , Specific view :link.
With a single-execution-unit processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1).
MIPS Calculation DEMO
Directory structure :
- src
- main.c
- void test(int* arr, int len);
- vpu.h
- vpu.s
- main.c
Computational code :
#include <stdio.h>
#define MIPS_COUNT_ARM_CORTEX
#ifdef MIPS_COUNT_ARM_CORTEX
#include "v7_pmu.h"
#endif
#ifdef MIPS_COUNT_ARM_CORTEX
#define MILLION_UNIT (1000000.f)
#define KILO_UNIT (1000.f)
#define FRAME_LEN_MS (10.f) // 10ms
#define COUNT_NUM 1000
unsigned int counter0;
unsigned int cycle_count1;
unsigned int cycle_count2;
unsigned int cur_time = 0;
long double cur_time_tmp = 0.0;
double avg_time = 0;
unsigned long avg_time_tmp = 0;
unsigned int peak_time = 0;
float cycle2mips_coef = (1 / MILLION_UNIT) / (FRAME_LEN_MS / KILO_UNIT); // unit: mips
#endif
void main(void) {
// set mannual
cnt = COUNT_NUM;
while(cnt--) {
#ifdef MIPS_COUNT_ARM_CORTEX
enable_pmu(); // Enable the PMU
reset_ccnt(); // Reset the CCNT (cycle counter)
reset_pmn(); // Reset the configurable counters
pmn_config(0, 0x03); // Configure counter 0 to count event code 0x03
enable_ccnt(); // Enable CCNT
enable_pmn(0); // Enable counter
counter0 = read_pmn(0); // Read counter 0
cycle_count1 = read_ccnt(); // Read Core cycle
#endif
// test();
#ifdef MIPS_COUNT_ARM_CORTEX
cycle_count2 = read_ccnt();
cur_time = cycle_count2 - cycle_count1;
// 10^6 => million cycle, *1000/frmeLms => second
cur_time_tmp = (float)cur_time * cycle2mips_coef; // mips
avg_time_tmp += (unsigned int)cur_time_tmp;
if (cur_time > peak_time) {
peak_time = cur_time;
}
printf("%.2f mips \n", cur_time_tmp);
#endif
}
#ifdef MIPS_COUNT_ARM_CORTEX
avg_time = (double)avg_time_tmp / COUNT_NUM;
printf("max %.2f mips \n", (float)peak_time * cycle2mips_coef);
printf("avg %.2f mips \n", avg_time);
#endif
}
The module functions that calculate the overhead are usually placed in the related functions to test the overhead, such as test() Before and after , You can get the separate MIPS expenses , Of course , It can also be obtained by multiplying the overhead of the overall program by the proportion of the overhead of the related functions , But the calculation is inconvenient , It's not recommended here .
Test tools and processes
The tools needed
Soft copy testing tools usually use ARM The company's RVDS(RealView Development Suite) Development Kit , Simulate various kernel processes , Get the overhead data .
Hard copy test tools usually use Andriod Built in platform simpleperf Tools , Push the executable file directly to the mobile phone to run , Grab in real time CPU Data to get the actual cost data , And draw a diagram , Commonly known as flame diagram .
Soft and hard imitation optimization process
- Soft copy process
- install RVDS Software
- Configure the code engineering environment
- Run through code
- Write overhead calculation code
- Simulation Profile
- Get the hotspot function and overhead baseline
- Code optimization
- Test hotspot function overhead
- Hard copy process
- Similar to the soft copy process
- It is recommended to soft copy , Re hard imitation
- involves IO Read / write and other overhead issues , Soft simulation cannot simulate the actual operation , The hard imitation result shall prevail
With the hotspot overhead function , You can optimize the related instruction set and code .
Summary
This article shares the background and basic concepts of performance optimization , Time complexity calculation MCPS and MIPS, As well as test tools and software and hardware imitation process . Next share NEON Optimization cases and experiences .
边栏推荐
- Cognition - how to fill in 2022 college entrance examination volunteers
- three.js第一人称 相机前枪的跟随
- Leetcode99 week race record
- neo4j图数据库基本概念
- Edge在IE模式下加载网页 - Edge设置IE兼容性
- Leetcode298 weekly race record
- 洛谷P4683 [IOI2008] Type Printer 题解
- Microservice system design -- unified authentication service design
- Obtenir le volume du système à travers les plateformes de l'unit é
- stm32单片机引脚_如何将单片机的引脚配置为上拉输入
猜你喜欢

Microservice system design -- distributed transaction service design

Microservice system design -- message caching service design

Tsinghua University open source software mirror website

Codeforces Round #802 (Div. 2)
![[station B up dr_can learning notes] Kalman filter 3](/img/40/d3ec97be2f29b76a6c049c26ff4998.gif)
[station B up dr_can learning notes] Kalman filter 3

什么是BFC?有什么用?

Wechat applet websocket use case

Terminal in pychar cannot enter the venv environment

体验 win10 下 oceanbase 数据库

微信小程序WebSocket使用案例
随机推荐
Obtenir le volume du système à travers les plateformes de l'unit é
Microservice system design -- microservice invocation design
导航【机器学习】
Chapter 2 Introduction to key technologies
DAST black box vulnerability scanner part 6: operation (final)
Logu p4683 [ioi2008] type printer problem solving
020 basics of C language: C language forced type conversion and error handling
Unity point light disappears
微服务系统设计——服务熔断和降级设计
014 C language foundation: C string
Navigation [machine learning]
Web3还没实现,Web5乍然惊现!
Execution rules of pytest framework
【NIPS 2017】PointNet++:度量空间中点集的深层次特征学习
013 basics of C language: C pointer
three. JS first person camera gun following
Asp.Net Core6 WebSocket 简单案例
流媒体协议初探(MPEG2-TS、RTSP、RTP、RTCP、SDP、RTMP、HLS、HDS、HSS、MPEG-DASH)
Quick sort (non recursive) and merge sort
面试:Selenium 中有哪几种定位方式?你最常用的是哪一种?