当前位置:网站首页>The SSE instructions into ARM NEON
The SSE instructions into ARM NEON
2022-08-02 15:26:00 【Hongyao】
Related Information
● sse instruction set: sse instruction explanation
● sse2neon repository: You can find the corresponding neon instruction conversion method in sse2neon.h
Notes
● Converting sse instructions to arm neon instructions is often difficult to optimize, and may even result in negative optimization, so this part of the optimization is for reference only.
__mm_shuffle_ps conversion
The function of __mm_shuffle_ps is to take two elements from m1 and put them in the low position of m3. According to the last two arrays of _MM_SHUFFLE(i3,i2,i1,i0), take two elements from m2 and put them in m3The high bits are based on the first two numbers of _MM_SHUFFLE(i3,i2,i1,i0).

For the conversion of __mm_shuffle_ps, most of sse2neon uses the combination of load and store instructions and type conversion operations, such as the following code, corresponding to __mm_shuffle_ps(a,b,__MM_SHUFFLE(2,2,0,0)).
FORCE_INLINE __m128 _mm_shuffle_ps_2200(__m128 a, __m128 b){float32x2_t a00 = vdup_lane_f32(vget_low_f32(vreinterpretq_f32_m128(a)), 0);float32x2_t b22 =vdup_lane_f32(vget_high_f32(vreinterpretq_f32_m128(b)), 0);return vreinterpretq_m128_f32(vcombine_f32(a00, b22));}Directly using a conversion like the above will definitely cause the performance not to increase but to decrease. The best way is to find similar operations in neon. This part of the operation is mainly concentrated in permutation, such as vtrn,vrev,vzip,vuzp, etc.
For example, in the above example: if you need to get at the same time __mm_shuffle_ps(a,a,__MM_SHUFFLE(2,2,0,0))and __mm_shuffle_ps(a,a,__MM_SHUFFLE(3,3,1,1)), you can use vtrnq_32f(a,a) to get the result, the result is float32x4x2_t type, val[0] corresponds to 2200, val[1] corresponds to 3311.
边栏推荐
- 单端K总线收发器DP9637兼容L9637
- 网络安全抓包
- LeetCode2 电话号码的字母组合
- PyTorch⑩---卷积神经网络_一个小的神经网络搭建
- How to add a one-key shutdown option to the right-click menu in Windows 11
- How to set the win10 taskbar does not merge icons
- flink+sklearn——使用jpmml实现flink上的机器学习模型部署
- 2021-10-14
- ECP2459耐压60V降压BUCK电路用于WIFI模块供电方案原理图
- Binder ServiceManager解析
猜你喜欢
随机推荐
win10怎么设置不睡眠熄屏?win10设置永不睡眠的方法
【我的电赛日记(一)】HMI USART串口屏
How to reinstall Win7 system with U disk?How to reinstall win7 using u disk?
What should I do if Windows 10 cannot connect to the printer?Solutions for not using the printer
13.56MHZ刷卡芯片CI521兼容cv520/ci520支持A卡B卡MIFARE协议
DP1332E刷卡芯片支持NFC内置mcu智能楼宇/终端poss机/智能门锁
Makefile容易犯错的语法
HAL框架
小T成长记-网络篇-1-什么是网络?
arm push/pop/b/bl汇编指令
发布模块到npm应该怎么操作?及错误问题解决方案
win11一直弹出用户账户控制怎么解决
设备驱动框架简介
PyTorch③---torchvision中数据集的使用
ARMv8虚拟化
SQL的通用语法和使用说明(图文)
Win10电脑需要安装杀毒软件吗?
FP5207电池升压 5V9V12V24V36V42V大功率方案
Redis的线程模型
基于深度学习的配准框架









