当前位置:网站首页>[w806 drummer's notes]fpu performance simple test - May 23, 2022
[w806 drummer's notes]fpu performance simple test - May 23, 2022
2022-06-09 04:19:00 【ZZZ_ XXJ】
W806 It's a safe IoT MCU chip . Chip integration 32 position CPU processor , built-in UART、GPIO、SPI、SDIO、 I2C、I2S、PSRAM、7816、ADC、LCD、TouchSensor Equal digital interface ; Support TEE Security engine , Support a variety of hardware encryption and decryption algorithms , built-in DSP、 Floating point units and security engines , Support code security permission settings , built-in 1MB Flash Memory , Support firmware encryption storage 、 Firmware signature 、 Security debugging 、 Security upgrade and other security measures , Ensure product safety characteristics . Suitable for small household appliances 、 Smart home 、 Intelligent toy 、 Industrial control 、 Medical care and other extensive areas of the Internet of things .
List of articles
FPU brief introduction
The following is excerpted from 《 Darksteel E804 User's Manual _v04》.
Floating point units act as E804 Configurable hardware unit of , Designed to improve E804 Processing power for floating point applications .E804 Floating point units provide a low cost 、 High performance hardware floating point implementation .
Floating point units support IEEE-754 Single precision floating-point operation in floating-point standard , Realized 16 A single precision floating-point register . Supported by the system software ,E804 It can support double precision floating-point operation .
The main characteristics of the architecture and programming model of floating-point unit are as follows :
- Fully compatible with ANSI/IEEE Std 754 Floating point standard ( Supported by system software );
- Only single precision floating-point operations are supported ;
- Rounding to zero is supported 、 Round to infinity 、 Rounding to negative infinity and rounding to the nearest ;
- It supports two processing modes of floating-point exceptions, trapping and non trapping ;
- Support the precise handling of floating-point exceptions ;
- Support floating-point hardware division and square root .
The main features of the microarchitecture of floating-point cells are as follows :
- 16 Separate single precision floating-point registers ;
- Single emitting structure , One floating-point arithmetic instruction per cycle ;
- Support sequential emission of floating-point arithmetic instructions 、 Execute in order 、 Write back in sequence ;
- It contains three independent execution pipelines , They are floating point ALU、 Floating point multiplication and floating point division ;
- Optimized execution delay technology , Except for floating-point division and square root instruction , Can be found in 1-2 Clock cycles have been completed ;
- Cost optimization technology based on operation component reuse ;
- Power optimization technology based on gated clock and data path isolation .
Test project
Basic algorithm
- Floating point and floating point addition, subtraction, multiplication and division
- Trigonometric and anti trigonometric functions
- Square root
- e At the bottom of the X The next power
- X Of Y The next power
Compound algorithm
- 100 Point first-order low-pass filtering
- 100 Point sine curve value
- 32*32 Pixels RGB Go gray
The test method
Record closing separately FPU And on FPU when ,10ms The number of algorithm executions in the timing period , The higher the value, the better .
All tests use single precision floating point , Code -O3 Optimize , In order to minimize the additional overhead of the loop , Manually expand the innermost layer of the cycle .
test result
Basic algorithm
Floating point and floating point addition, subtraction, multiplication and division
__IO uint32_t cnt = 0;
__IO float a = 1.1f;
__IO float b = 0.123456f;
__IO float c;
while(1)
{
c = a + b;
c = a + b;
c = a + b;
c = a + b;
cnt += 4;
}
give the result as follows , Subtraction 、 Multiplication 、 Division is tested in the same way as addition , So the test code is not released .
| Algorithm | close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|---|
| c = a + b | 12708 | 319268 | 25.12 |
| c = a - b | 11116 | 319268 | 28.72 |
| c = a * b | 18640 | 319268 | 17.13 |
| c = a / b | 5592 | 69412 | 12.41 |
Trigonometric and anti trigonometric functions
| Algorithm | close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|---|
| c = sinf(a) | 440 | 14068 | 31.97 |
| c = cosf(a) | 480 | 15628 | 32.56 |
| c = tanf(a) | 236 | 9268 | 39.27 |
| c = asinf(a) | 6612 | 41112 | 6.22 |
| c = acosf(a) | 6668 | 37860 | 5.68 |
| c = atanf(a) | 336 | 3816 | 11.36 |
Square root
while(1)
{
c = sqrtf(a);
c = sqrtf(a);
c = sqrtf(a);
c = sqrtf(a);
cnt += 4;
}
| Algorithm | close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|---|
| sqrtf() | 7364 | 57704 | 7.84 |
e At the bottom of the X The next power
while(1)
{
c = expf(a);
c = expf(a);
c = expf(a);
c = expf(a);
cnt += 4;
}
| Algorithm | close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|---|
| expf() | 444 | 14404 | 32.44 |
X Of Y The next power
while(1)
{
c = powf(a, b);
c = powf(a, b);
c = powf(a, b);
c = powf(a, b);
cnt += 4;
}
| Algorithm | close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|---|
| powf() | 124 | 5308 | 42.8 |
Compound algorithm
100 Point first-order low-pass filtering
__IO float t1[100];
__IO float t2;
static inline float first_oder_filter(float new_data)
{
static float old_data;
float a = 0.05f;
old_data = a * new_data + (1.0f - a) * old_data;
return old_data;
}
int main(void)
{
SystemClock_Config(CPU_CLK_240M);
for (uint32_t i = 0; i < 100; i++)
{
t1[i] = rand() / (float)(RAND_MAX / 0xffff);
}
while (1)
{
for (uint32_t i = 0; i < 100; i += 4)
{
t2 = first_oder_filter(t1[i]);
t2 = first_oder_filter(t1[i + 1]);
t2 = first_oder_filter(t1[i + 2]);
t2 = first_oder_filter(t1[i + 3]);
}
cnt++;
}
return 0;
}
| close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|
| 37 | 1940 | 52.43 |
100 Point sine curve
int main(void)
{
SystemClock_Config(CPU_CLK_240M);
/* Sine formula :y=Asin(ωx+ψ)+k */
__IO float W = 3.1415926f / 50.f;
__IO float A = 100.0f;
__IO float k = 50.0f;
__IO float offset = 6.0f;
__IO float wave;
__IO float x[100] = {
0.0f };
float temp = 0.0f;
for (uint32_t i = 0; i < 100; i++)
{
x[i] = temp + 0.05f;
}
while (1)
{
for (uint32_t i = 0; i < 100; i += 4)
{
wave = A * sinf(W * x[i] + offset) + k;
wave = A * sinf(W * x[i + 1] + offset) + k;
wave = A * sinf(W * x[i + 2] + offset) + k;
wave = A * sinf(W * x[i + 3] + offset) + k;
}
cnt++;
}
return 0;
}
| close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|
| 4 | 116 | 29.0 |
32*32 Pixels RGB Go gray
int main(void)
{
SystemClock_Config(CPU_CLK_240M);
__IO uint32_t cnt = 0;
__IO uint8_t color_r[32][32];
__IO uint8_t color_g[32][32];
__IO uint8_t color_b[32][32];
__IO uint8_t color_gray[32][32];
for (uint32_t i = 0; i < 32; i++)
{
for (uint32_t j = 0; j < 32; j++)
{
color_r[i][j] = rand() % 0xff;
color_g[i][j] = rand() % 0xff;
color_b[i][j] = rand() % 0xff;
}
}
while (1)
{
for (uint32_t i = 0; i < 32; i++)
{
for (uint32_t j = 0; j < 32; j += 4)
{
color_gray[i][j] = color_r[i][j] * 0.299f + color_g[i][j] * 0.587f + color_b[i][j] * 0.114f;
color_gray[i][j + 1] = color_r[i][j + 1] * 0.299f + color_g[i][j + 1] * 0.587f + color_b[i][j + 1] * 0.114f;
color_gray[i][j + 2] = color_r[i][j + 2] * 0.299f + color_g[i][j + 2] * 0.587f + color_b[i][j + 2] * 0.114f;
color_gray[i][j + 3] = color_r[i][j + 3] * 0.299f + color_g[i][j + 3] * 0.587f + color_b[i][j + 3] * 0.114f;
}
}
cnt++;
}
return 0;
}
| close FPU( Time /10ms) | open FPU( Time /10ms) | Frequency ratio |
|---|---|---|
| 2 | 61 | 30.5 |
summary
Through the above test results, it can be found that ,W806 Of XT804 The kernel is not FPU The floating-point number computing power is very weak , and FPU The addition of can greatly improve the computing power of the kernel for single precision floating point , Whether it is basic algorithm or composite algorithm , Almost all of them have been improved dozens of times .W806 The project is enabled by default FPU Of , Just use it directly .
Reference material
- 《W806 MCU Chip specification V2.0》
- 《 Darksteel E804 User's Manual _v04》
边栏推荐
- golang---各个类型变量的比较运算
- Lua operator
- Openinfra Foundation launched the "targeted fund" program to promote successful open source governance experience
- Online Morse code online translation and conversion tool
- 微信小程序:(异常)Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $ 解决方案和分析流程(这里一定有你要的答案)
- VS Code `launch.json` 和 `task.json` 中可使用的预定义变量
- VS Code `launch. Json` and `task Predefined variables available in json`
- How is the little red dot that you can't help but click when you see it on the app realized?
- P5321 [BJOI2019]送别(LCT)
- hisi3559av100,MIPI相机输入接口调试
猜你喜欢
随机推荐
User controlled keyboard and mouse custom script
(9) Branch loop structure
Iscc-2022-reverse-mobile- part WP
Online JSON to XML tool
5、快速(分组)排序
变量提升和函数提升
Expansion chip, hisi3559av100 I2C debugging
openGL_ 01 create window
测试网站搭建+渗透+审计之第三篇Swagger接口渗透测试
golang---redis操作
openGL_03使用不同的VAO与VBO,以及不同的shader
Template: constant coefficient homogeneous linear recurrence (linear algebra, polynomial)
1264_ Analysis of FreeRTOS task initialization and stack initialization processing
[excellent design] opencv based face recognition punch in / sign in / attendance management system (the simplest basic library development, which can be based on raspberry pie)
PHP e签宝电子签名Saas API 对接流程
RAVDESS语音情感分类数据集的介绍
Software testing (II)
Attention OCR Chinese version mas ter code running logic
《Attention-ocr-Chinese-Version-mas # ter》代碼運行邏輯
P5321 [bjoi2019] farewell (LCT)





![[SWPU2019]ReverseMe](/img/82/e8160f4128bfa09af53443cf219491.png)



