当前位置:网站首页>[w806 drummer's notes]fpu performance simple test - May 23, 2022

[w806 drummer's notes]fpu performance simple test - May 23, 2022

2022-06-09 04:19:00 ZZZ_ XXJ

W806 It's a safe IoT MCU chip . Chip integration 32 position CPU processor , built-in UART、GPIO、SPI、SDIO、 I2C、I2S、PSRAM、7816、ADC、LCD、TouchSensor Equal digital interface ; Support TEE Security engine , Support a variety of hardware encryption and decryption algorithms , built-in DSP、 Floating point units and security engines , Support code security permission settings , built-in 1MB Flash Memory , Support firmware encryption storage 、 Firmware signature 、 Security debugging 、 Security upgrade and other security measures , Ensure product safety characteristics . Suitable for small household appliances 、 Smart home 、 Intelligent toy 、 Industrial control 、 Medical care and other extensive areas of the Internet of things .


FPU brief introduction

The following is excerpted from 《 Darksteel E804 User's Manual _v04》.

Floating point units act as E804 Configurable hardware unit of , Designed to improve E804 Processing power for floating point applications .E804 Floating point units provide a low cost 、 High performance hardware floating point implementation .
Floating point units support IEEE-754 Single precision floating-point operation in floating-point standard , Realized 16 A single precision floating-point register . Supported by the system software ,E804 It can support double precision floating-point operation .

The main characteristics of the architecture and programming model of floating-point unit are as follows :

  • Fully compatible with ANSI/IEEE Std 754 Floating point standard ( Supported by system software );
  • Only single precision floating-point operations are supported ;
  • Rounding to zero is supported 、 Round to infinity 、 Rounding to negative infinity and rounding to the nearest ;
  • It supports two processing modes of floating-point exceptions, trapping and non trapping ;
  • Support the precise handling of floating-point exceptions ;
  • Support floating-point hardware division and square root .

The main features of the microarchitecture of floating-point cells are as follows :

  • 16 Separate single precision floating-point registers ;
  • Single emitting structure , One floating-point arithmetic instruction per cycle ;
  • Support sequential emission of floating-point arithmetic instructions 、 Execute in order 、 Write back in sequence ;
  • It contains three independent execution pipelines , They are floating point ALU、 Floating point multiplication and floating point division ;
  • Optimized execution delay technology , Except for floating-point division and square root instruction , Can be found in 1-2 Clock cycles have been completed ;
  • Cost optimization technology based on operation component reuse ;
  • Power optimization technology based on gated clock and data path isolation .

Test project

Basic algorithm

  • Floating point and floating point addition, subtraction, multiplication and division
  • Trigonometric and anti trigonometric functions
  • Square root
  • e At the bottom of the X The next power
  • X Of Y The next power

Compound algorithm

  • 100 Point first-order low-pass filtering
  • 100 Point sine curve value
  • 32*32 Pixels RGB Go gray

The test method

Record closing separately FPU And on FPU when ,10ms The number of algorithm executions in the timing period , The higher the value, the better .
All tests use single precision floating point , Code -O3 Optimize , In order to minimize the additional overhead of the loop , Manually expand the innermost layer of the cycle .

test result

Basic algorithm

Floating point and floating point addition, subtraction, multiplication and division

__IO uint32_t cnt = 0;
__IO float a = 1.1f;
__IO float b = 0.123456f;
__IO float c;

while(1)
{
    
	c = a + b;
	c = a + b;
	c = a + b;
	c = a + b;
	cnt += 4;
}

give the result as follows , Subtraction 、 Multiplication 、 Division is tested in the same way as addition , So the test code is not released .

Algorithm close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
c = a + b1270831926825.12
c = a - b1111631926828.72
c = a * b1864031926817.13
c = a / b55926941212.41

Trigonometric and anti trigonometric functions

Algorithm close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
c = sinf(a)4401406831.97
c = cosf(a)4801562832.56
c = tanf(a)236926839.27
c = asinf(a)6612411126.22
c = acosf(a)6668378605.68
c = atanf(a)336381611.36

Square root

while(1)
{
    
	c = sqrtf(a);
	c = sqrtf(a);
	c = sqrtf(a);
	c = sqrtf(a);
	cnt += 4;
}
Algorithm close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
sqrtf()7364577047.84

e At the bottom of the X The next power

while(1)
{
    
	c = expf(a);
	c = expf(a);
	c = expf(a);
	c = expf(a);
	cnt += 4;
}
Algorithm close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
expf()4441440432.44

X Of Y The next power

while(1)
{
    
	c = powf(a, b);
	c = powf(a, b);
	c = powf(a, b);
	c = powf(a, b);
	cnt += 4;
}
Algorithm close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
powf()124530842.8

Compound algorithm

100 Point first-order low-pass filtering

__IO float t1[100];
__IO float t2;

static inline float first_oder_filter(float new_data)
{
    
  static float old_data;
  float a = 0.05f;
  old_data = a * new_data + (1.0f - a) * old_data;
  return old_data;
}

int main(void)
{
    
	SystemClock_Config(CPU_CLK_240M);

	for (uint32_t i = 0; i < 100; i++)
	{
    
		t1[i] = rand() / (float)(RAND_MAX / 0xffff);
	}
	while (1)
	{
    
		for (uint32_t i = 0; i < 100; i += 4)
		{
    
			t2 = first_oder_filter(t1[i]);
			t2 = first_oder_filter(t1[i + 1]);
			t2 = first_oder_filter(t1[i + 2]);
			t2 = first_oder_filter(t1[i + 3]);
		}
		cnt++;
	}

	return 0;
}
close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
37194052.43

100 Point sine curve

int main(void)
{
    
	SystemClock_Config(CPU_CLK_240M);

	/*  Sine formula :y=Asin(ωx+ψ)+k */
	__IO float W = 3.1415926f / 50.f;
	__IO float A = 100.0f;
	__IO float k = 50.0f;
	__IO float offset = 6.0f;
	__IO float wave;
	__IO float x[100] = {
     0.0f };
	
	float temp = 0.0f;
	for (uint32_t i = 0; i < 100; i++)
	{
    
		x[i] = temp + 0.05f;
	}

	while (1)
	{
    
		for (uint32_t i = 0; i < 100; i += 4)
		{
    
			wave = A * sinf(W * x[i] + offset) + k;
			wave = A * sinf(W * x[i + 1] + offset) + k;
			wave = A * sinf(W * x[i + 2] + offset) + k;
			wave = A * sinf(W * x[i + 3] + offset) + k;
		}
		cnt++;
	}

	return 0;
}
close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
411629.0

32*32 Pixels RGB Go gray

int main(void)
{
    
	SystemClock_Config(CPU_CLK_240M);

	__IO uint32_t cnt = 0;
	__IO uint8_t color_r[32][32];
	__IO uint8_t color_g[32][32];
	__IO uint8_t color_b[32][32];
	__IO uint8_t color_gray[32][32];

	for (uint32_t i = 0; i < 32; i++)
	{
    
		for (uint32_t j = 0; j < 32; j++)
		{
    
			color_r[i][j] = rand() % 0xff;
			color_g[i][j] = rand() % 0xff;
			color_b[i][j] = rand() % 0xff;
		}
	}

	while (1)
	{
    
		for (uint32_t i = 0; i < 32; i++)
		{
    
			for (uint32_t j = 0; j < 32; j += 4)
			{
    
				color_gray[i][j] = color_r[i][j] * 0.299f + color_g[i][j] * 0.587f + color_b[i][j] * 0.114f;
				color_gray[i][j + 1] = color_r[i][j + 1] * 0.299f + color_g[i][j + 1] * 0.587f + color_b[i][j + 1] * 0.114f;
				color_gray[i][j + 2] = color_r[i][j + 2] * 0.299f + color_g[i][j + 2] * 0.587f + color_b[i][j + 2] * 0.114f;
				color_gray[i][j + 3] = color_r[i][j + 3] * 0.299f + color_g[i][j + 3] * 0.587f + color_b[i][j + 3] * 0.114f;
			}
		}
		cnt++;
	}

	return 0;
}
close FPU( Time /10ms) open FPU( Time /10ms) Frequency ratio
26130.5

summary

Through the above test results, it can be found that ,W806 Of XT804 The kernel is not FPU The floating-point number computing power is very weak , and FPU The addition of can greatly improve the computing power of the kernel for single precision floating point , Whether it is basic algorithm or composite algorithm , Almost all of them have been improved dozens of times .W806 The project is enabled by default FPU Of , Just use it directly .

Reference material

  1. 《W806 MCU Chip specification V2.0》
  2. 《 Darksteel E804 User's Manual _v04》
原网站

版权声明
本文为[ZZZ_ XXJ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090411186265.html