当前位置:网站首页>Embedded C language loop deployment
Embedded C language loop deployment
2022-07-27 19:38:00 【WangLanguager】
1、 analysis :
During each cycle , Two instructions are added to the loop body , A subtraction instruction (1 Machine cycles ), A branch instruction (3 Machine cycles ), common 4 Machine cycles , This is the overhead of the system .
2、 improvement :
(1) Repeat the circulatory body many times , Reduce overhead ratio .
(2) The code in the loop body increases , Fewer cycles .
3、 Examples of loop expansion :
int checksum_v9(int *data, unsigned int N)
{
int sum = 0;
do
{
sum += *(data++);
sum += *(data++);
sum += *(data++);
sum += *(data++);
N-=4; // Suppose the number of data to be accumulated is 4 Multiple
} while (N!=0);
return sum;
}
The assembly code of the above program is :
checksum_v9_s
MOV r2,#0 ;sum = 0
checksum_v9_loop
LDR r3,[r0] #4 ;r3 = *(data++)
SUBS r1,r1,#4 ;N-=4 and set flags
ADD r2,r3,r2 ;sum += r3
LDR r3,[r0] #4 ;r3 = *(data++)
ADD r2,r3,r2 ;sum += r3
LDR r3,[r0] #4 ;r3 = *(data++)
ADD r2,r3,r2 ;sum += r3
LDR r3,[r0] #4 ;r3 = *(data++)
ADD r2,r3,r2 ;sum += r3
BNE checksum_v9_loop ;if(N != 0) goto loop
MOV r0,#r2 ;r0 = sum
MOV pc,r14 ;return r0
4、 Discuss
After improvement : The total cycle overhead is from 4N Machine cycles are reduced to N Machine cycles ( Each cycle requires 4 Cycle cost of machine cycles , Reduce to the original 1/4). If the circulation body is smaller , The more obvious the effect of this method , It can even improve efficiency nearly 1 times .
hypothesis N=20, Execute before optimization 83 statement , Then execute after optimization 53 statement .
5、 problem
(1) How much should be expanded ?
(2) The number of expansions is not 4 Multiple ?
6、 The idea of circular expansion
(1) Disadvantages of loop expansion
① Code increase , Take up more memory
②Cache Space occupation problem
(2) So we need specific analysis of specific problems , Find the balance
7、 give an example :
for example : The execution of the loop body in the program requires 128 Machine cycles , The cycle overhead is generally 4 Machine cycles ,
Occupy 3%, If the loop body accounts for 30%, Then the cycle cost accounts for about of the total program 1%, This is the moment to unfold
loop , Performance improvement is limited .
The expansion of the loop , It's possible to destroy cache The content in , Causes jitter , Make the program performance drop sharply .
8、 Suppose the number of data to be accumulated is not 4 Integer multiple
int checksum_v10(int *data, unsigned int N)
{
unsigned int i;
int sum = 0;
for(i = N/4; i != 0; i --)
{
sum += *(data++);
sum += *(data++);
sum += *(data++);
sum += *(data++);
};
for(i = N&3; i != 0; i--)
{
sum += *(data++); // Suppose the number of data to be accumulated is not 4 Multiple
}
return sum;
}
9、 Conclusion
(1) The cycle count value should be decreased , When the counter adopts unsigned number , Terminate with (i!=0), Do not use (i >= 0)
(2) If it is determined that the number of cycles is greater than 1, Then use do{}while Loop structure
(3) For small loop , Cycle expansion can be carried out , Reduce system overhead
(4) Try to make the size of the array as expansion coefficient N Multiple
边栏推荐
- C language: 13. Pointer and memory
- Debian recaptured the "debian.community" domain name, but it's still not good to stop and rest
- 请问创建MySQL数据源资源组必须要选择新建独享数据集成资源组才可用?还是使用公共资源组就可以?谢谢
- [Luogu p4183] cow at large P (graph theory) (tree array)
- Webmagic+selenium+chromedriver+jdbc grabs data vertically.
- 应用程序池已被禁用
- 新系统安装MySQL+SQLyog
- Anaconda下安装Talib库
- redis底层数据结构详解
- Using vscode to build u-boot development environment
猜你喜欢

IIS 发生未知FastCGI错误:0x80070005

Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image

C language: 12. GDB tool debugging C program

Summary of APP launch in vivo application market

MFC高级控件之Tab控件( CTabCtrl )

带来高价值用户体验的低代码开发平台

汉字查拼音微信小程序项目源码

Using vscode to build u-boot development environment

C language: 15. Structure

成本高、落地难、见效慢,开源安全怎么办?
随机推荐
C language: 6. Simple use and precautions of pointer
IEC104 规约详细解读(一) 协议结构
2022 Ningde Vocational College Teachers' practical teaching ability improvement training - network construction and management
Rename file with command line
Take byte offer in four rounds and answer the interview questions
Complete source code of E-commerce mall applet project (wechat applet)
记一次无准备的实习面试
VMware: set up SSH
Opening and using Alibaba cloud object storage OSS
Rs2022/ cloud detection: semi supervised cloud detection in satellite images by considering the
Basic network faults and troubleshooting
Definition of graph traversal and depth first search and breadth first search (2)
ipfs通过接口获得公钥、私钥,并加密存储。第一弹
sql 时间处理(SQL SERVER\ORACLE)
C语言案例:密码设置及登录> 明解getchar与scanf
ES6 learning notes (1) - quick start
C language: C language code style
[Luogu p4183] cow at large P (graph theory) (tree array)
Tab control of MFC advanced control (CTabCtrl)
成本高、落地难、见效慢,开源安全怎么办?