当前位置:网站首页>After reading the average code written by Microsoft God, I realized that I was still too young
After reading the average code written by Microsoft God, I realized that I was still too young
2022-07-05 01:15:00 【QbitAl】
Bowen From the Aofei temple
qubits | official account QbitAI
Rounding The average of unsigned integers , Flowers can come out ?
see , Microsoft God Raymond Chen A recent long article directly detonated the extranet technology platform , Sparked countless discussions :
Countless people click in with great confidence : Isn't it a simple programming problem for primary school students who divide by two after addition ?
unsigned average(unsigned a, unsigned b)
{
return (a + b) / 2;
}
But follow the great God and dig deep , But gradually stare at the dog ……
It's not that simple to average
Let's start with the method that primary school students will know at the beginning , This simple method has a fatal flaw :
If the length of an unsigned integer is 32 position , So if the two added values are half the maximum length , Then only when adding in the first step , It will happen out of memory .
That is to say average(0x80000000U, 0x80000000U)=0.
But there are many solutions , Most experienced developers First Can think of , Is to limit the length of the number added in advance , Avoid spillovers .
There are two ways :
1、 When you know the larger value of two unsigned integers added , Subtract the smaller value and divide by two , In advance Reduce the length :
unsigned average(unsigned low, unsigned high)
{
return low + (high - low) / 2;
}
2、 Divide two unsigned integers in advance , At the same time through Bitwise AND Correct the lower digits , Ensure that when both integers are odd , The results are still correct .
( By the way , This is a patented method ,2016 Year out )
unsigned average(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}
These two are common ideas , Many netizens also said , The fastest thing I can think of is 2016 Patented method .
There are also ways that can be quickly thought of by the majority of netizens SWAR(SIMD within a register):
unsigned average(unsigned a, unsigned b)
{
return (a & b) + (a ^ b) / 2;// variant (a ^ b) + (a & b) * 2
as well as C++ 20 In version std: : midpoint function .
Next , The author puts forward The second way of thinking :
If the unsigned integer is 32 Bit and the size of the local register is 64 position , Or the compiler supports multi word operation , You can force the addition value into long integer data .
unsigned average(unsigned a, unsigned b)
{
// Suppose "unsigned" is a 32-bit type and
// "unsigned long long" is a 64-bit type.
return ((unsigned long long)a + b) / 2;
}
however , Here's a point that needs special attention :
It must be ensured that 64 The first bit of the register 32 All for 0, It won't affect the rest 32 A value .
Like x86-64 and aarch64 These architectures will automatically 32 A value Zero expansion by 64 A value :
// x86-64: Assume ecx = a, edx = b, upper 32 bits unknown
mov eax, ecx ; rax = ecx zero-extended to 64-bit value
mov edx, edx ; rdx = edx zero-extended to 64-bit value
add rax, rdx ; 64-bit addition: rax = rax + rdx
shr rax, 1 ; 64-bit shift: rax = rax >> 1
; result is zero-extended
; Answer in eax
// AArch64 (ARM 64-bit): Assume w0 = a, w1 = b, upper 32 bits unknown
uxtw x0, w0 ; x0 = w0 zero-extended to 64-bit value
uxtw x1, w1 ; x1 = w1 zero-extended to 64-bit value
add x0, x1 ; 64-bit addition: x0 = x0 + x1
ubfx x0, x0, 1, 32 ; Extract bits 1 through 32 from result
; (shift + zero-extend in one instruction)
; Answer in x0
and Alpha AXP、mips64 And other architectures will 32 A value Symbol extension by 64 A value .
This time , You need to add an additional zeroing instruction , For example, through the deletion instruction of carry left two words rldicl:
// Alpha AXP: Assume a0 = a, a1 = b, both in canonical form
insll a0, #0, a0 ; a0 = a0 zero-extended to 64-bit value
insll a1, #0, a1 ; a1 = a1 zero-extended to 64-bit value
addq a0, a1, v0 ; 64-bit addition: v0 = a0 + a1
srl v0, #1, v0 ; 64-bit shift: v0 = v0 >> 1
addl zero, v0, v0 ; Force canonical form
; Answer in v0
// MIPS64: Assume a0 = a, a1 = b, sign-extended
dext a0, a0, 0, 32 ; Zero-extend a0 to 64-bit value
dext a1, a1, 0, 32 ; Zero-extend a1 to 64-bit value
daddu v0, a0, a1 ; 64-bit addition: v0 = a0 + a1
dsrl v0, v0, #1 ; 64-bit shift: v0 = v0 >> 1
sll v0, #0, v0 ; Sign-extend result
; Answer in v0
// Power64: Assume r3 = a, r4 = b, zero-extended
add r3, r3, r4 ; 64-bit addition: r3 = r3 + r4
rldicl r3, r3, 63, 32 ; Extract bits 63 through 32 from result
; (shift + zero-extend in one instruction)
; result in r3
Or directly access larger than the native register SIMD register , Of course , From general register to SIMD Registers must also increase memory consumption .
If the computer's processor supports carry addition , Then you can also use The third way of thinking .
At this time , If the register size is n position , Well, the two one. n The sum of unsigned integers of bits can be understood as n+1 position , adopt RCR( Shift right with carry cycle ) Instructions , You can get the correct average , Without losing the overflow bit .
△ Shift right with carry cycle
// x86-32
mov eax, a
add eax, b ; Add, overflow goes into carry bit
rcr eax, 1 ; Rotate right one place through carry
// x86-64
mov rax, a
add rax, b ; Add, overflow goes into carry bit
rcr rax, 1 ; Rotate right one place through carry
// 32-bit ARM (A32)
mov r0, a
adds r0, b ; Add, overflow goes into carry bit
rrx r0 ; Rotate right one place through carry
// SH-3
clrt ; Clear T flag
mov a, r0
addc b, r0 ; r0 = r0 + b + T, overflow goes into T bit
rotcr r0 ; Rotate right one place through carry
What if the processor does not support the right shift operation with carry cycle ?
Internal circulation can also be used (rotation intrinsic):
unsigned average(unsigned a, unsigned b)
{
#if defined(_MSC_VER)
unsigned sum;
auto carry = _addcarry_u32(0, a, b, &sum);
sum = (sum & ~1) | carry;
return _rotr(sum, 1);
#elif defined(__clang__)
unsigned carry;
sum = (sum & ~1) | carry;
auto sum = __builtin_addc(a, b, 0, &carry);
return __builtin_rotateright32(sum, 1);
#else
#error Unsupported compiler.
#endif
}
The result is ,x86 The code generation under the architecture has not changed much ,MSCver Code generation under architecture gets worse , and arm-thumb2 Of clang Better code generation .
// _MSC_VER
mov ecx, a
add ecx, b ; Add, overflow goes into carry bit
setc al ; al = 1 if carry set
and ecx, -2 ; Clear bottom bit
movzx ecx, al ; Zero-extend byte to 32-bit value
or eax, ecx ; Combine
ror ear, 1 ; Rotate right one position
; Result in eax
// __clang__
mov ecx, a
add ecx, b ; Add, overflow goes into carry bit
setc al ; al = 1 if carry set
shld eax, ecx, 31 ; Shift left 64-bit value
// __clang__ with ARM-Thumb2
movs r2, #0 ; Prepare to receive carry
adds r0, r0, r1 ; Calculate sum with flags
adcs r2, r2 ; r2 holds carry
lsrs r0, r0, #1 ; Shift sum right one position
lsls r1, r2, #31 ; Move carry to bit 31
adds r0, r1, r0 ; Combine
The thinking of Microsoft God
Raymond Chen1992 Joined Microsoft in 1996 , So far, he has served in 25 year , do UEX-Shell, Participate in Windows Development ,Windows Many aspects of the system are initially UI He started the architecture .
He was in MSDN Built on blogThe Old New Thing It is also a well-known pure technology output website in the industry .
The comment area of this blog is also haunted by various gods of Microsoft , Continue to explore .
Someone has proposed a new method , stay MIPS ASM share 36 Cycle :
Some people are aiming at 2016 The patent law of , Instead of using (a / 2) + (b / 2) + (a & b & 1) Methods , Why not just put (a & 1) & ( b & 1 ) ) Put it into the adder as carry to calculate ?
Someone else recommended in the comment area TopSpeed compiler , You can define an inline function by specifying appropriate code bytes and calling conventions , To solve the problem “ The result of multiplication and division is 16 position , The intermediate calculated value is not ” The situation of .
It can only be said , There is no end to learning .
original text :
https://devblogs.microsoft.com/oldnewthing/20220207-00/?p=106223
Reference link :
https://news.ycombinator.com/item?id=30252263
边栏推荐
- Global and Chinese market of veterinary thermometers 2022-2028: Research Report on technology, participants, trends, market size and share
- Maximum number of "balloons"
- Pycharm professional download and installation tutorial
- FEG founder rox:smartdefi will be the benchmark of the entire decentralized financial market
- [development of large e-commerce projects] performance pressure test - Performance Monitoring - heap memory and garbage collection -39
- 【FPGA教程案例9】基于vivado核的时钟管理器设计与实现
- 142. Circular linked list II
- 抓包整理外篇——————状态栏[ 四]
- Basic operations of database and table ----- delete index
- Ruby tutorial
猜你喜欢
ROS command line tool
[pure tone hearing test] pure tone hearing test system based on MATLAB
How to use words to describe breaking change in Spartacus UI of SAP e-commerce cloud
微信小程序:星宿UI V1.5 wordpress系统资讯资源博客下载小程序微信QQ双端源码支持wordpress二级分类 加载动画优化
Basic operation of database and table ----- phased test II
Implementation steps of master detail detail layout mode of SAP ui5 application
Single step debugging of master data reading of SAP commerce cloud products
SAP UI5 应用开发教程之一百零七 - SAP UI5 OverflowToolbar 容器控件介绍的试读版
微信小程序:微群人脉微信小程序源码下载全新社群系统优化版支持代理会员系统功能超高收益
资深测试/开发程序员写下无bug?资历(枷锁)不要惧怕错误......
随机推荐
Basic operations of database and table ----- create index
Actual combat simulation │ JWT login authentication
当产业互联网时代真正发展完善之后,将会在每一个场景见证巨头的诞生
小程序直播 + 电商,想做新零售电商就用它吧!
微信小程序:星宿UI V1.5 wordpress系统资讯资源博客下载小程序微信QQ双端源码支持wordpress二级分类 加载动画优化
dotnet-exec 0.6.0 released
107. Some details of SAP ui5 overflow toolbar container control and resize event processing
Discrete mathematics: Main Normal Form (main disjunctive normal form, main conjunctive normal form)
Basic operations of database and table ----- delete index
I was beaten by the interviewer because I didn't understand the sorting
Apifox (postman + swagger + mock + JMeter), an artifact of full stack development and efficiency improvement
微信小程序:全新独立后台月老办事处一元交友盲盒
Inventory of more than 17 typical security incidents in January 2022
SAP ui5 application development tutorial 107 - trial version of SAP ui5 overflow toolbar container control introduction
Take you ten days to easily complete the go micro service series (IX. link tracking)
If the consumer Internet is compared to a "Lake", the industrial Internet is a vast "ocean"
揭露测试外包公司,关于外包,你或许听到过这样的声音
College degree, what about 33 year old Baoma? I still sell and test, and my monthly income is 13K+
Arbitrum:二维费用
[FPGA tutorial case 9] design and implementation of clock manager based on vivado core