当前位置:网站首页>After reading the average code written by Microsoft God, I realized that I was still too young
After reading the average code written by Microsoft God, I realized that I was still too young
2022-07-05 01:15:00 【QbitAl】
Bowen From the Aofei temple
qubits | official account QbitAI
Rounding The average of unsigned integers , Flowers can come out ?
see , Microsoft God Raymond Chen A recent long article directly detonated the extranet technology platform , Sparked countless discussions :

Countless people click in with great confidence : Isn't it a simple programming problem for primary school students who divide by two after addition ?
unsigned average(unsigned a, unsigned b)
{
return (a + b) / 2;
}But follow the great God and dig deep , But gradually stare at the dog ……
It's not that simple to average
Let's start with the method that primary school students will know at the beginning , This simple method has a fatal flaw :
If the length of an unsigned integer is 32 position , So if the two added values are half the maximum length , Then only when adding in the first step , It will happen out of memory .
That is to say average(0x80000000U, 0x80000000U)=0.
But there are many solutions , Most experienced developers First Can think of , Is to limit the length of the number added in advance , Avoid spillovers .
There are two ways :
1、 When you know the larger value of two unsigned integers added , Subtract the smaller value and divide by two , In advance Reduce the length :
unsigned average(unsigned low, unsigned high)
{
return low + (high - low) / 2;
}2、 Divide two unsigned integers in advance , At the same time through Bitwise AND Correct the lower digits , Ensure that when both integers are odd , The results are still correct .
( By the way , This is a patented method ,2016 Year out )
unsigned average(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}These two are common ideas , Many netizens also said , The fastest thing I can think of is 2016 Patented method .
There are also ways that can be quickly thought of by the majority of netizens SWAR(SIMD within a register):
unsigned average(unsigned a, unsigned b)
{
return (a & b) + (a ^ b) / 2;// variant (a ^ b) + (a & b) * 2as well as C++ 20 In version std: : midpoint function .
Next , The author puts forward The second way of thinking :
If the unsigned integer is 32 Bit and the size of the local register is 64 position , Or the compiler supports multi word operation , You can force the addition value into long integer data .
unsigned average(unsigned a, unsigned b)
{
// Suppose "unsigned" is a 32-bit type and
// "unsigned long long" is a 64-bit type.
return ((unsigned long long)a + b) / 2;
}however , Here's a point that needs special attention :
It must be ensured that 64 The first bit of the register 32 All for 0, It won't affect the rest 32 A value .
Like x86-64 and aarch64 These architectures will automatically 32 A value Zero expansion by 64 A value :
// x86-64: Assume ecx = a, edx = b, upper 32 bits unknown
mov eax, ecx ; rax = ecx zero-extended to 64-bit value
mov edx, edx ; rdx = edx zero-extended to 64-bit value
add rax, rdx ; 64-bit addition: rax = rax + rdx
shr rax, 1 ; 64-bit shift: rax = rax >> 1
; result is zero-extended
; Answer in eax
// AArch64 (ARM 64-bit): Assume w0 = a, w1 = b, upper 32 bits unknown
uxtw x0, w0 ; x0 = w0 zero-extended to 64-bit value
uxtw x1, w1 ; x1 = w1 zero-extended to 64-bit value
add x0, x1 ; 64-bit addition: x0 = x0 + x1
ubfx x0, x0, 1, 32 ; Extract bits 1 through 32 from result
; (shift + zero-extend in one instruction)
; Answer in x0and Alpha AXP、mips64 And other architectures will 32 A value Symbol extension by 64 A value .
This time , You need to add an additional zeroing instruction , For example, through the deletion instruction of carry left two words rldicl:
// Alpha AXP: Assume a0 = a, a1 = b, both in canonical form
insll a0, #0, a0 ; a0 = a0 zero-extended to 64-bit value
insll a1, #0, a1 ; a1 = a1 zero-extended to 64-bit value
addq a0, a1, v0 ; 64-bit addition: v0 = a0 + a1
srl v0, #1, v0 ; 64-bit shift: v0 = v0 >> 1
addl zero, v0, v0 ; Force canonical form
; Answer in v0
// MIPS64: Assume a0 = a, a1 = b, sign-extended
dext a0, a0, 0, 32 ; Zero-extend a0 to 64-bit value
dext a1, a1, 0, 32 ; Zero-extend a1 to 64-bit value
daddu v0, a0, a1 ; 64-bit addition: v0 = a0 + a1
dsrl v0, v0, #1 ; 64-bit shift: v0 = v0 >> 1
sll v0, #0, v0 ; Sign-extend result
; Answer in v0
// Power64: Assume r3 = a, r4 = b, zero-extended
add r3, r3, r4 ; 64-bit addition: r3 = r3 + r4
rldicl r3, r3, 63, 32 ; Extract bits 63 through 32 from result
; (shift + zero-extend in one instruction)
; result in r3Or directly access larger than the native register SIMD register , Of course , From general register to SIMD Registers must also increase memory consumption .
If the computer's processor supports carry addition , Then you can also use The third way of thinking .
At this time , If the register size is n position , Well, the two one. n The sum of unsigned integers of bits can be understood as n+1 position , adopt RCR( Shift right with carry cycle ) Instructions , You can get the correct average , Without losing the overflow bit .

△ Shift right with carry cycle
// x86-32
mov eax, a
add eax, b ; Add, overflow goes into carry bit
rcr eax, 1 ; Rotate right one place through carry
// x86-64
mov rax, a
add rax, b ; Add, overflow goes into carry bit
rcr rax, 1 ; Rotate right one place through carry
// 32-bit ARM (A32)
mov r0, a
adds r0, b ; Add, overflow goes into carry bit
rrx r0 ; Rotate right one place through carry
// SH-3
clrt ; Clear T flag
mov a, r0
addc b, r0 ; r0 = r0 + b + T, overflow goes into T bit
rotcr r0 ; Rotate right one place through carryWhat if the processor does not support the right shift operation with carry cycle ?
Internal circulation can also be used (rotation intrinsic):
unsigned average(unsigned a, unsigned b)
{
#if defined(_MSC_VER)
unsigned sum;
auto carry = _addcarry_u32(0, a, b, &sum);
sum = (sum & ~1) | carry;
return _rotr(sum, 1);
#elif defined(__clang__)
unsigned carry;
sum = (sum & ~1) | carry;
auto sum = __builtin_addc(a, b, 0, &carry);
return __builtin_rotateright32(sum, 1);
#else
#error Unsupported compiler.
#endif
}The result is ,x86 The code generation under the architecture has not changed much ,MSCver Code generation under architecture gets worse , and arm-thumb2 Of clang Better code generation .
// _MSC_VER
mov ecx, a
add ecx, b ; Add, overflow goes into carry bit
setc al ; al = 1 if carry set
and ecx, -2 ; Clear bottom bit
movzx ecx, al ; Zero-extend byte to 32-bit value
or eax, ecx ; Combine
ror ear, 1 ; Rotate right one position
; Result in eax
// __clang__
mov ecx, a
add ecx, b ; Add, overflow goes into carry bit
setc al ; al = 1 if carry set
shld eax, ecx, 31 ; Shift left 64-bit value
// __clang__ with ARM-Thumb2
movs r2, #0 ; Prepare to receive carry
adds r0, r0, r1 ; Calculate sum with flags
adcs r2, r2 ; r2 holds carry
lsrs r0, r0, #1 ; Shift sum right one position
lsls r1, r2, #31 ; Move carry to bit 31
adds r0, r1, r0 ; CombineThe thinking of Microsoft God
Raymond Chen1992 Joined Microsoft in 1996 , So far, he has served in 25 year , do UEX-Shell, Participate in Windows Development ,Windows Many aspects of the system are initially UI He started the architecture .

He was in MSDN Built on blogThe Old New Thing It is also a well-known pure technology output website in the industry .
The comment area of this blog is also haunted by various gods of Microsoft , Continue to explore .
Someone has proposed a new method , stay MIPS ASM share 36 Cycle :
Some people are aiming at 2016 The patent law of , Instead of using (a / 2) + (b / 2) + (a & b & 1) Methods , Why not just put (a & 1) & ( b & 1 ) ) Put it into the adder as carry to calculate ?
Someone else recommended in the comment area TopSpeed compiler , You can define an inline function by specifying appropriate code bytes and calling conventions , To solve the problem “ The result of multiplication and division is 16 position , The intermediate calculated value is not ” The situation of .
It can only be said , There is no end to learning .

original text :
https://devblogs.microsoft.com/oldnewthing/20220207-00/?p=106223
Reference link :
https://news.ycombinator.com/item?id=30252263
边栏推荐
- Basic operation of database and table ----- the concept of index
- What happened to those who focused on automated testing?
- How to use words to describe breaking change in Spartacus UI of SAP e-commerce cloud
- Research Report on the overall scale, major producers, major regions, products and application segmentation of agricultural automatic steering system in the global market in 2022
- Four pits in reentrantlock!
- pycharm专业版下载安装教程
- Mongodb series learning notes tutorial summary
- 微信小程序:微群人脉微信小程序源码下载全新社群系统优化版支持代理会员系统功能超高收益
- SAP ui5 application development tutorial 106 - how to improve the readability of SAP ui5 application routing URL trial version
- La jeunesse sans rancune de Xi Murong
猜你喜欢

Nebula Importer 数据导入实践

POAP:NFT的采用入口?

华为百万聘请数据治理专家!背后的千亿市场值得关注

Delaying wages to force people to leave, and the layoffs of small Internet companies are a little too much!

Take you ten days to easily complete the go micro service series (IX. link tracking)

每日刷题记录 (十三)
![[wave modeling 1] theoretical analysis and MATLAB simulation of wave modeling](/img/c4/46663f64b97e7b25d7222de7025f59.png)
[wave modeling 1] theoretical analysis and MATLAB simulation of wave modeling

SAP ui5 application development tutorial 106 - how to improve the readability of SAP ui5 application routing URL trial version

pycharm专业版下载安装教程

Pycharm professional download and installation tutorial
随机推荐
JS implementation determines whether the point is within the polygon range
Daily question brushing record (13)
Database postragesql lock management
RB technology stack
Digital DP template
微信小程序:独立后台带分销功能月老办事处交友盲盒
Check if this is null - checking if this is null
[flutter topic] 64 illustration basic textfield text input box (I) # yyds dry goods inventory #
Are you still writing the TS type code
测试部新来了个00后卷王,上了年纪的我真的干不过了,已经...
Take you ten days to easily complete the go micro service series (IX. link tracking)
Database postragesq role membership
【海浪建模1】海浪建模的理论分析和matlab仿真
【CTF】AWDP总结(Web)
If the consumer Internet is compared to a "Lake", the industrial Internet is a vast "ocean"
La jeunesse sans rancune de Xi Murong
node工程中package.json文件作用是什么?里面的^尖括号和~波浪号是什么意思?
||Interview questions you will encounter
Database postragesq PAM authentication
[Yocto RM]11 - Features