当前位置:网站首页>Floating point number exploration
Floating point number exploration
2022-07-25 09:21:00 【halazi100】
Floating point number exploration
Floating point numbers are used in computers to approximate any real number . specifically , The real number consists of an integer or fixed point ( Mantissa ) Times some base ( Usually in computers 2) Omega to the integer power of omega .
How to convert decimal to binary
Integral part
- Method 1 Integral part divided by 2 Write the remainder upside down
59/2 **** ***1
29/2 *** ***1
14/2 ** ***0
7/2 * ***1
3/2 ***1
1/2 **1
0/2 *0
0/2 0
59 The binary representation of 0011 1011;
- Method 2 Binary decomposition
Convert a decimal number into multiple 2 The sum of the integral powers of , Then they are converted into binary , Finally, merge all binaries ;
54 = 2^5 + 2^4 + 2^2 + 2^1
= 0010 0000 + 0001 0000 + 0000 0100 + 0000 0010
= 0011 0110
The fractional part
multiply 2 Rounding
Such as 0.25 Binary conversion
0.25*2=0.5 0
0.5*2 =1.0 1
namely 0.25 Convert binary to 01
Such as 0.4 Binary conversion
0.4*2 =0.8 0
0.8*2 =1.6 1
0.6*2 =1.2 1
0.2*2 =0.4 0
...
namely 0.4 Change to binary to 0110 0110 ...., That is, binary description of decimals cannot be absolutely accurate ;
The representation of floating point numbers
According to international standards IEEE 754, Any binary floating point number V It can be expressed as follows (-1)^S * M * 2^E
among
(-1)^SThe sign bit , When S=0,V Being positive , When S=1,V It's a negative number .MRepresents a significant number ,[1,2).2^EThe index , With 2 Base number .
For example, in the decimal system 5.0 It's written as a binary floating point number 101.0, In this form, it is (-1)^0 * 1.01 * 2^2,
among S=0,M=1.01,E=2.
Another example is the decimal system -5.5, It's written as a binary floating point number 101.1, In this form, it is (-1)^1 * 1.011 * 2^2,
among S=1,M=1.011,E=2.
The representation of floating-point numbers in memory
according to IEEE754 The standard stipulates :
about 32 Floating point number of bits (float type ), The highest bit is the sign bit S, Next in 8 Bits are exponents E, The rest 23 Bits are significant numbers M.
about 64 Floating point number of bits (double type ), The highest 1 Bits are sign bits S, And then 11 Bits are exponents E, The rest 52 Bits are significant numbers M.
┌─────────────┬──────────────────┬────────────────────┬─────────────────────┐
│ type │ S(sign bit) │ E(Exponent area) │ M(Mantissa area) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ float │ 1 bit(31bit) │ 8 bits(23-30bit) │ 23 bits(0-22bit) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ double │ 1 bit(63bit) │ 11 bits(52-62bit) │ 52 bits(0-51bit) │
└─────────────┴──────────────────┴────────────────────┴─────────────────────┘
float And double The representation of type data is the same inside the computer , However, due to the different storage space , The range and accuracy of data values that can be represented are different .
Sign bit S
For sign bits , Only 0 and 1 Two cases , They are positive and negative respectively .
Significant figures M
For significant numbers M, because M The range is [1,2), in other words M The integer part of must be 1, therefore IEEE754 The standard stipulates , Keep it in the computer M when , By default, the first digit of this number is always 1, So you can give up , Save only the following fraction .
For example preservation 1.01 When , Save only a fraction 01, And round off the integer part , Wait until you read , Put the first 1 Add .
The purpose of this is to save 1 Significant digits .
32 Bit floating-point numbers are left to M Only 23 position , After giving up the first one , You can keep 24 Significant digits .
Significant figures M The number of digits determines the accuracy of the data
- float:
2^23= 8388608, common 7 position , Most can have 7 Significant digits ; float The accuracy of is 6-7 Significant digits ( Can guarantee 6 position ); - double:
2^52= 4503599627370496, common 16 position , Most can have 16 Significant digits ;double The accuracy of is 15-16 Significant digits ;
Index part E
- Index part E It's an unsigned integer
- If E by 8 position (float type ), that E The range that can be expressed is 0-255,
- If E by 11 position (double type ), that E The range that can be expressed is 0-2047;
This index E Obviously it can be negative , but unsigned int The type of E It's a nonnegative number .
therefore IEEE754 The standard stipulates , In memory , The real index has to add an intermediate value (8 Bit E The median value is 127,11 Bit E The median value is 1023).
Like a float Count
E=3, Then when saving into memory, add 127 Programming 130 after , Then convert it into binary, that is1000 0010Post storage .
- E Not all for 0 Or not all of them 1
For floating-point numbers 5.0
S=0, Direct storage ;M=1.01, Round off integer 1, Put the decimal part 01 Storage , The spare bits in the back are 0 A filling ;E=2, Need to add 127 become 129 And convert it into binary post storage area ;
be 5.0 The final binary representation is
0-100 0000 1-010 0000 0000 0000 0000 0000
With 16 The hexadecimal display is40 A0 00 00
For floating-point numbers -5.5
S=1, Direct storage ;M=1.011, Round off integer 1, Put the decimal part 011 Storage , The spare bits in the back are 0 A filling ;E=2, Need to add 127 become 129 And convert it into binary post storage area ;
be 5.0 The final binary representation is
1-100 0000 1-011 0000 0000 0000 0000 0000
With 16 The hexadecimal display isC0 B0 00 00
┌─────────────┬──────────────────┬────────────────────┬─────────────────────┐
│ value │ S(sign bit) │ E(Exponent area) │ M(Mantissa area) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ 5.0 │ 0 │ 100 0000 1 │ 010 0000 ... │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ -5.5 │ 1 │ 100 0000 1 │ 011 0000 ... │
└─────────────┴──────────────────┴────────────────────┴─────────────────────┘
#include <stdio.h>
int main()
{
float f1 = 5.0;
float f2= -5.5;
printf("%f, 0x%x\n", f1, *(unsigned int*)&f1); // 5.000000, 0x40a00000
printf("%f, 0x%x\n", f2, *(unsigned int*)&f2); // -5.500000, 0xc0b00000
return 0;
}
- E All for 0 when
Take floating point numbers for example .
because E add 127 After all 0, in other words E The real value of is -127, That is, the floating-point index part is2^(-127), This is a very small number , At this point, the significant number M No more first 1, It's reduced to 0 Decimals of integers .
This is to show that 0, And close to 0 A very small number of .
The same with double precision floating point .
#include <stdio.h>
void show_binary(const float f) {
unsigned int num = *(unsigned int*)&f;
printf("%.6f, 0x%X: ", f, num);
const size_t max_size = 8 * sizeof(float);
int i = (int)max_size;
while (0 <= --i) {
printf("%c", ((num >> i) & 0x1) + '0');
if (0 == (i%4)) {
printf(" ");
}
}
printf("\n");
}
int main()
{
float f21 = 5.0f;
float f22= -5.5f;
float f31 = 0.0f;
float f32 = 0.000001f;
show_binary(f21); // 5.000000, 0x40A00000: 0100 0000 1010 0000 0000 0000 0000 0000
show_binary(f22); // -5.500000, 0xC0B00000: 1100 0000 1011 0000 0000 0000 0000 0000
show_binary(f31); // 0.000000, 0x0: 0000 0000 0000 0000 0000 0000 0000 0000
show_binary(f32); // 0.000001, 0x358637BD: 0011 0101 1000 0110 0011 0111 1011 1101
return 0;
}
- E All for 1 when
Take floating point numbers for example .
because E add 127 After all 1, in other words E The real value of is 128, That is, the floating-point exponent part is2^128, It shows that this is a huge number , At this point, it means positive and negative infinity ( The positive and negative are determined by S decision ).
The same with double precision floating point .
Index E The number of bits in a part determines the range of data that can be represented
Occupy 4 Bytes of int The range of types :[-2^31,2^31-1];
Occupy 4 Bytes of float The range of types : It's about [-3.4*10^38,3.4*10^38], namely (-2^128,+2^128);
why int and float All occupy 4 Bytes of memory ,float But than int The scope of expression is much larger ?
Secret
- float The number of specific numbers that can be expressed is the same as int identical
- float There is a discontinuity between representable numbers , There are jumps
- float Just an approximate representation , Cannot be used as an exact number
- Because the memory representation is relatively complex ,float The speed of computing is faster than int A lot slower
Summary
- The memory representation of floating-point type is different from that of integer type
- Floating point type memory representation is more complex
- Floating point types can represent a wider range
- Floating point type is an imprecise type
- Floating point types are slower
边栏推荐
- Redis的十大常见应用场景
- Detailed explanation of pipeline pipeline mechanism in redis
- MySQL排序
- The operation cannot be completed because a folder or file in it is already open in another program
- Feiling ok1028a core board adapts to rtl8192cu WiFi module
- [deep learning] mask Dino Trilogy - the correct way to open Detr Pandora's box
- Sort out Huawei ap-3010dn_ V2 configuration create WiFi
- Leetcode组合总和+剪枝
- redis的五种数据结构原理分析
- Collection of common algorithm questions in test post interview
猜你喜欢
![[STL]list模拟实现](/img/92/2a78382700c1ebf299c6505d962c9c.png)
[STL]list模拟实现

Nacos搭建配置中心出现client error: invalid param. endpoint is blank

Feiling ok1028a core board adapts to rtl8192cu WiFi module

Cool canvas animation shock wave JS special effect

yarn : 无法加载文件 yarn.ps1,因为在此系统上禁止运行脚本。

分布式一致性协议之Raft

uni-app - Refused to display ‘xxx‘ in a frame because an ancestor violates the following Content Sec

保姆级Scanner类使用详解

Query efficiency increased by 10 times! Three optimization schemes to help you solve the deep paging problem of MySQL

什么是单机、集群与分布式?
随机推荐
office文件对应的Content-Type类型
分布式一致性协议之Raft
Wechat applet obtains the data of ---- onenet and controls the on-board LED of STM32
Collection of common algorithm questions in test post interview
Six storage types in C language: Auto register static extern const volatile
The garbage classification data set used in the excellent Yolo target detection training is shared - about 3000 labeled
Opencv realizes simple face tracking
神经网络学习(1)前言介绍
activemq--消息重试机制
API健康状态自检
The hole of scroll view in uniapp
对称式加密与非对称式加密的对比
[deep learning] overview | the latest progress of deep learning
[stl]stack & queue simulation implementation
『怎么用』代理模式
[STL]list模拟实现
JDBC quick start
Neural network learning (1) Introduction
[learn rust together] a preliminary understanding of rust package management tool cargo
Dark horse programmer JDBC