当前位置:网站首页>The SSE instructions into ARM NEON
The SSE instructions into ARM NEON
2022-08-02 15:26:00 【Hongyao】
Related Information
● sse instruction set: sse instruction explanation
● sse2neon repository: You can find the corresponding neon instruction conversion method in sse2neon.h
Notes
● Converting sse instructions to arm neon instructions is often difficult to optimize, and may even result in negative optimization, so this part of the optimization is for reference only.
__mm_shuffle_ps conversion
The function of __mm_shuffle_ps is to take two elements from m1 and put them in the low position of m3. According to the last two arrays of _MM_SHUFFLE(i3,i2,i1,i0), take two elements from m2 and put them in m3The high bits are based on the first two numbers of _MM_SHUFFLE(i3,i2,i1,i0).

For the conversion of __mm_shuffle_ps, most of sse2neon uses the combination of load and store instructions and type conversion operations, such as the following code, corresponding to __mm_shuffle_ps(a,b,__MM_SHUFFLE(2,2,0,0)).
FORCE_INLINE __m128 _mm_shuffle_ps_2200(__m128 a, __m128 b){float32x2_t a00 = vdup_lane_f32(vget_low_f32(vreinterpretq_f32_m128(a)), 0);float32x2_t b22 =vdup_lane_f32(vget_high_f32(vreinterpretq_f32_m128(b)), 0);return vreinterpretq_m128_f32(vcombine_f32(a00, b22));}Directly using a conversion like the above will definitely cause the performance not to increase but to decrease. The best way is to find similar operations in neon. This part of the operation is mainly concentrated in permutation, such as vtrn,vrev,vzip,vuzp, etc.
For example, in the above example: if you need to get at the same time __mm_shuffle_ps(a,a,__MM_SHUFFLE(2,2,0,0))and __mm_shuffle_ps(a,a,__MM_SHUFFLE(3,3,1,1)), you can use vtrnq_32f(a,a) to get the result, the result is float32x4x2_t type, val[0] corresponds to 2200, val[1] corresponds to 3311.
边栏推荐
猜你喜欢
随机推荐
FP7122降压恒流内置MOS耐压100V共正极阳极PWM调光方案原理图
ECP2459耐压60V降压BUCK电路用于WIFI模块供电方案原理图
STM32F1和F4的区别
小T成长记-网络篇-1-什么是网络?
系统线性、时不变、因果判断
使用libcurl将Opencv Mat的图像上传到文件服务器,基于post请求和ftp协议两种方法
FP5139电池与适配器供电DC-DC隔离升降压电路反激电路电荷泵电路原理图
Redis的线程模型
【深度学习中的损失函数整理与总结】
Win7怎么干净启动?如何只加载基本服务启动Win7系统
DP4056电源保护芯片锂电池pin对pinTP4056
Impressions of Embrace Jetpack
define #使用
Failed to install using npx -p @storybook/cli sb init, build a dedicated storybook by hand
STM32LL库使用——SPI通信
PyTorch①---加载数据、tensorboard的使用
Win7 encounters an error and cannot boot into the desktop normally, how to solve it?
【我的电赛日记(完结)---2021全国大学生电子设计竞赛全国一等奖】A题:信号失真度测量装置
PyTorch⑤---卷积神经网络_卷积层
DP4344兼容CS4344-DA转换器









