当前位置:网站首页>Intel internal instructions - AVX and avx2 learning notes
Intel internal instructions - AVX and avx2 learning notes
2022-07-02 12:20:00 【Virgo programmer's friend】
AVX Programming based
data type
data type describe
__m128 contain 4 individual float Vector of type number
__m128d contain 2 individual double Vector of type number
__m128i A vector containing several integers
__m256 contain 8 individual float Vector of type number
__m256d contain 4 individual double Vector of type number
__m256i A vector containing several integers
Every type , from 2 Beginning of underscore , Pick up a m, And then there was vector Bit length of .
If the vector type is d The end of the , So in the vector is double Type of number . If there is no suffix , It means that the vector only contains float Type of number .
The shaped vector can contain all kinds of shaped numbers , for example char,short,unsigned long long. in other words ,__m256i Can contain 32 individual char,16 individual short type ,8 individual int type ,4 individual long type . These integers can be signed or unsigned .
Function naming convention
<bit_width> It shows the bit length of the vector , about 128 Vector of bits , This parameter is empty , about 256 Vector of bits , This parameter is 256.
<name> Describes the arithmetic operations of inline functions .
<data_type> Identifies the data type of the function's main argument .
-ps contain float Vector of type
pd contain double Vector of type
epi8/epi16/epi32/epi64 contain 8 position /16 position /32 position /64 Signed integer of bit
epu8/epu16/epu32/epu64 contain 8 position /16 position /32 position /64 An unsigned integer of bits
si128/si256 Unspecified 128 Bits or 256 Bit vector
m128/m128i/m128d/m256/m256i/m256d When the type of input vector is different from the type of return vector , Identify input vector type
Initialization function
Initialize with scalar values
data type describe
_mm256_setzero_ps/pd Return to an all 0 Of float Vector of type
_mm256_setzero_si256 Return to an all 0 Shape vector of
_mm256_set1_ps/pd Use one float Number fill vector of type
_mm256_set1_epi8/epi16/epi32/epi64x Filling a vector with an integer
_mm256_set_ps/pd use 8 individual float perhaps 4 individual double Type number initialization vector
_mm256_set_epi8/epi16/epi32/epi64x Initializes a vector with an integer
_mm256_set_m128/m128d/m128i use 2 individual 128 The vector of bit initializes a 256 Bit vector
_mm256_setr_ps/pd use 8 individual float perhaps 4 individual double Transpose order initialization vector of
_mm256_setr_epi8/epi16/epi32/epi64x Initializes a vector with the transpose order of several integers
Load data from memory
data type describe
_mm256_load_ps/pd Load floating-point vectors from aligned memory addresses
_mm256_load_si256 Load the shaping vector from the aligned memory address
_mm256_loadu_ps/pd Load floating-point vectors from unaligned memory addresses
_mm256_loadu_si256 Load shaping vectors from unaligned memory addresses
_mm_maskload_ps/pd Load according to mask 128 Bit floating point vector part
_mm256_maskload_ps/pd Load according to mask 256 Bit floating point vector part
(2)_mm_maskload_epi32/64 Load according to mask 128 Part of a bit shaping vector
(2)_mm256_maskload_epi32/64 Load according to mask 256 Part of a bit shaping vector
Last 2 There is a function before (2), Represents the two functions only in the AVX2 Chinese support .
Essence of arithmetic
Addition and subtraction
data type describe
_mm256_add_ps/pd Add two floating-point vectors
_mm256_sub_ps/pd Subtracting two floating-point vectors
(2)_mm256_add_epi8/16/32/64 Add two shaping vectors
(2)_mm256_sub_epi8/16/32/64 Subtraction of two shaping vectors
(2)_mm256_adds_epi8/16 (2)_mm256_adds_epu8/16 Two integer vectors are added and memory saturation is considered
(2)_mm256_subs_epi8/16 (2)_mm256_subs_epu8/16 Two integer vectors are subtracted and memory saturation is considered
_mm256_hadd_ps/pd Two in horizontal direction float Adding type vectors
_mm256_hsub_ps/pd Two in the vertical direction float Subtraction of type vector
(2)_mm256_hadd_epi16/32 Add two shaping vectors horizontally
(2)_mm256_hsub_epi16/32 The two shaping vectors in the horizontal direction are subtracted
(2)_mm256_hadds_epi16 For two contains short Type vectors are added and memory saturation is considered
(2)_mm256_hsubs_epi16 For two contains short Type vectors are subtracted and memory saturation is considered
_mm256_addsub_ps/pd Add and subtract two float Vector of type
A function that takes saturation into account holds the result to the minimum that can be stored / Maximum . Functions without saturation ignore memory problems when saturation occurs .
The meaning of adding and subtracting in the horizontal direction is shown in the figure below :
Last instruction :_mm256_addsub_ps/pd Subtract at even position , Odd position plus , Get the final target vector .
Multiplication and division
data type describe
_mm256_mul_ps/pd The two one. float Multiply vectors of type
(2)_mm256_mul_epi32 (2)_mm256_mul_epu32 Will include 32 The lowest four elements of the vector of a bit integer are multiplied by
(2)_mm256_mullo_epi16/32 Multiply integers and store low halves
(2)_mm256_mulhi_epi16 (2)_mm256_mulhi_epu16 Multiply integers and store high halves
(2)_mm256_mulhrs_epi16 Multiply 16-bit elements to form 32-bit elements
_mm256_div_ps/pd The two one. float Type vector to divide
Combining multiplication and addition
data type describe
(2)_mm_fmadd_ps/pd/ (2)_mm256_fmadd_ps/pd Multiply two vectors , And then add the third product .(res=a*b+c)
(2)_mm_fmsub_ps/pd/ (2)_mm256_fmsub_ps/pd Multiply two vectors , Then subtract a vector from the product .(res=a*b-c)
(2)_mm_fmadd_ss/sd Multiply and add the lowest elements of a vector (res[0]=a[0]*b[0]+c[0])
(2)_mm_fmsub_ss/sd Multiply and subtract the lowest element of a vector (res[0]=a[0]*b[0]-c[0])
(2)_mm_fnmadd_ps/pd (2)_mm256_fnmadd_ps/pd Multiply two vectors , And add the negative product to the third .(res = -(a * b) + c)
(2)_mm_fnmsub_ps/pd/ (2)_mm256_fnmsub_ps/pd Multiply two vectors , And add the negative product to the third (res = -(a * b) - c)
(2)_mm_fnmadd_ss/sd Multiply the low order of two vectors , And add the negative product to the lower order of the third vector .(res[0] = -(a[0] * b[0]) + c[0])
(2)_mm_fnmsub_ss/sd Multiply the lowest elements , And subtract the lowest element of the third vector from the product of negation .(res[0] = -(a[0] * b[0]) - c[0])
(2)_mm_fmaddsub_ps/pd/ (2)_mm256_fmaddsub_ps/pd Multiply two vectors , Then add and subtract... Alternately from the product (res=a*b+/-c)
(2)_mm_fmsubadd_ps/pd/ (2)_mmf256_fmsubadd_ps/pd Multiply two vectors , Then subtract and add alternately from the product (res=a*b-/+c)( Odd power , Even power )
Arrange and shuffle
data type describe
_mm_permute_ps/pd _mm256_permute_ps/pd according to 8 Bit control values select elements from the input vector
(2)_mm256_permute4x64_pd/ (2)_mm256_permute4x64_epi64 according to 8 The bit control value is selected from the input vector 64 Bit elements
_mm256_permute2f128_ps/pd be based on 8 Bit control values are selected from two input vectors 128 Bit block
_mm256_permute2f128_si256 be based on 8 Bit control values are selected from two input vectors 128 Bit block
_mm_permutevar_ps/pd _mm256_permutevar_ps/pd Select the elements from the input vector according to the bits in the integer vector
(2)_mm256_permutevar8x32_ps (2)_mm256_permutevar8x32_epi32 Using index selection in integer vectors 32 Bit elements ( Floating point and integer )
data type describe
_mm256_shuffle_ps/pd according to 8 Bit value select floating point element
_mm256_shuffle_epi8/ _mm256_shuffle_epi32 according to 8 Bit value select integer element
(2)_mm256_shufflelo_epi16/ (2)_mm256_shufflehi_epi16 be based on 8 Bit control values are selected from two input vectors 128 Bit block
about _mm256_shuffle_pd, Use only the high of the control value 4 position . If the input vector contains int or float, Use all control bits . about _mm256_shuffle_ps, The first two pairs select elements from the first vector , The second pair selects elements from the second vector .
Reference blog
- CPU指令集介绍
- (C language) octal conversion decimal
- arcgis js 4. Add pictures to x map
- Uniapp uni list item @click, uniapp uni list item jump with parameters
- Leetcode122 the best time to buy and sell stocks II
- The blink code based on Arduino and esp8266 runs successfully (including error analysis)
- 寻找二叉树中任意两个数的公共祖先
- Drools executes string rules or executes a rule file
- CDA数据分析——AARRR增长模型的介绍、使用
- 分布式机器学习框架与高维实时推荐系统
PyTorch nn. Full analysis of RNN parameters
Brush questions --- binary tree --2
Initial JDBC programming
The blink code based on Arduino and esp8266 runs successfully (including error analysis)
Experiment of connecting mobile phone hotspot based on Arduino and esp8266 (successful)
Small guide for rapid formation of manipulator (VII): description method of position and posture of manipulator
(C language) input a line of characters and count the number of English letters, spaces, numbers and other characters.
Discrimination of the interval of dichotomy question brushing record (Luogu question sheet)
[C language] Yang Hui triangle, customize the number of lines of the triangle
Multiply LCA (nearest common ancestor)
记录一下MySql update会锁定哪些范围的数据
Small guide for rapid formation of manipulator (VII): description method of position and posture of manipulator
Applet link generation
Take you ten days to easily finish the finale of go micro services (distributed transactions)
String palindrome hash template question o (1) judge whether the string is palindrome
Uniapp uni list item @click, uniapp uni list item jump with parameters
Record the range of data that MySQL update will lock
Map and set
[QT] Qt development environment installation (QT version 5.14.2 | QT download | QT installation)
Tas (file d'attente prioritaire)
LeetCode—<动态规划专项>剑指 Offer 19、49、60
MSI announced that its motherboard products will cancel all paper accessories