当前位置:网站首页>The SSE instructions into ARM NEON
The SSE instructions into ARM NEON
2022-08-02 15:26:00 【Hongyao】
Related Information
● sse instruction set: sse instruction explanation
● sse2neon repository: You can find the corresponding neon instruction conversion method in sse2neon.h
Notes
● Converting sse instructions to arm neon instructions is often difficult to optimize, and may even result in negative optimization, so this part of the optimization is for reference only.
__mm_shuffle_ps conversion
The function of __mm_shuffle_ps is to take two elements from m1 and put them in the low position of m3. According to the last two arrays of _MM_SHUFFLE(i3,i2,i1,i0), take two elements from m2 and put them in m3The high bits are based on the first two numbers of _MM_SHUFFLE(i3,i2,i1,i0).

For the conversion of __mm_shuffle_ps, most of sse2neon uses the combination of load and store instructions and type conversion operations, such as the following code, corresponding to __mm_shuffle_ps(a,b,__MM_SHUFFLE(2,2,0,0)).
FORCE_INLINE __m128 _mm_shuffle_ps_2200(__m128 a, __m128 b){float32x2_t a00 = vdup_lane_f32(vget_low_f32(vreinterpretq_f32_m128(a)), 0);float32x2_t b22 =vdup_lane_f32(vget_high_f32(vreinterpretq_f32_m128(b)), 0);return vreinterpretq_m128_f32(vcombine_f32(a00, b22));}Directly using a conversion like the above will definitely cause the performance not to increase but to decrease. The best way is to find similar operations in neon. This part of the operation is mainly concentrated in permutation, such as vtrn,vrev,vzip,vuzp, etc.
For example, in the above example: if you need to get at the same time __mm_shuffle_ps(a,a,__MM_SHUFFLE(2,2,0,0))and __mm_shuffle_ps(a,a,__MM_SHUFFLE(3,3,1,1)), you can use vtrnq_32f(a,a) to get the result, the result is float32x4x2_t type, val[0] corresponds to 2200, val[1] corresponds to 3311.
边栏推荐
- 用U盘怎么重装Win7系统?如何使用u盘重装系统win7?
- Mysql connection error solution
- Use tencent cloud builds a personal blog
- ARMv8虚拟化
- pygame绘制弧线
- Win7 encounters an error and cannot boot into the desktop normally, how to solve it?
- What is Win10 God Mode for?How to enable God Mode in Windows 10?
- win10 system update error code 0x80244022 how to do
- What should I do if the Win10 system sets the application identity to automatically prompt for access denied?
- Win10 can't start WampServer icon is orange solution
猜你喜欢
随机推荐
golang之GMP调度模型
推开机电的大门《电路》(二):功率计算与判断
FP7122降压恒流内置MOS耐压100V共正极阳极PWM调光方案原理图
将SSE指令转换为ARM NEON指令
FP6293电池升压5V-12V大电流2APWM模式升压方案
2020-02-06-快速搭建个人博客
PyTorch①---加载数据、tensorboard的使用
jest测试,组件测试
Spark及相关生态组件安装配置——快速回忆
What should I do if I install a solid-state drive in Win10 and still have obvious lags?
pygame绘制弧线
Binder ServiceManager解析
CI24R1小模块2.4G收发模块无线通信低成本兼容si24r1/XN297超低功耗
cmake配置libtorch报错Failed to compute shorthash for libnvrtc.so
FP5139电池与适配器供电DC-DC隔离升降压电路反激电路电荷泵电路原理图
FP5207电池升压 5V9V12V24V36V42V大功率方案
对疫情期间量化策略表现的看法
5. Use RecyclerView to elegantly achieve waterfall effect
source /build/envsetup.sh和lunch)
PyTorch(13)---优化器_随机梯度下降法









