当前位置：网站首页>[C language] deeply analyze the underlying principle of data storage

[C language] deeply analyze the underlying principle of data storage

2022-07-06 10:24:00 【East-sunrise】

Catalog

One . Preface

Two . Data type introduction

1. Data type and significance

2. Basic classification of types

3、 ... and . The storage of integers in memory

1. Original code 、 Inverse code 、 Complement code

2. Introduction of large and small end

1. What is big end, small end

2. practice ️

Four . Floating point storage in memory

1. An example

2. Floating point storage rules

5、 ... and . summary

One . Preface

️️ When we are typing code, all kinds of data are indispensable , integer 、 floating-point ... And do you know them ？ Today we will analyze in depth The storage principle of data in memory , And through some examples to deepen the degree of mastery ！ Studying these underlying principles is like cultivating our internal skills , Maybe the process is a little boring , But one day , One shot of internal skill , It will shine ！ Next, let's start our practice ！！

Two . Data type introduction

1. Data type and significance

stay C There are several basic built-in types in languages

char // Character data type Occupy 1 byte
short // Short Occupy 2 byte
int // plastic Occupy 4 byte
long // Long integer Occupy 4 byte (32 Bit environment ) or 8 byte (64 Bit environment )
long long // Longer plastic surgery Occupy 8 byte
float // Single-precision floating-point Occupy 4 byte
double // Double precision floating point Occupy 8 byte

The meaning of type ：
1. Use this type to exploit the size of memory space （ Size determines the range of use ）.
2. How to look at the perspective of memory space

2. Basic classification of types

Plastic surgery Family

char
        unsigned char
        signed char
short
        unsigned short [int]
        signed short [int]
int
        unsigned int
        signed int
long
        unsigned long [int]
        signed long [int]

Some people may feel doubt , Why? char It's an integer ？ It's always called character type ？
Because the essence of characters is ASCII value , Integer. , therefore char It is also divided into integer families
And when we usually write code , Like writing int a In fact, the default is equivalent to signed int a
If there is no special writing unsigned The default representation is the signed type

Floating point family ：

float
double

Construction type ：

> An array type
> Type of structure struct
> Enumeration type enum
> Joint type union

Seeing here, someone may ask , Why is an array type also a construction type ？
It's like int arr1[5] and int arr2[8] Just different types
arr1 The data type of is int[5], and arr2 The data type of is int[8]
in addition ,int arr[ ] and char arr[ ] It must be of different types
So for arrays , The element type when defining an array 、 The number of array elements is changing , His type is also changing

Pointer types ：

int *pi;
char *pc;
float* pf;
void* pv;

Empty type ：

void Indicates empty type （ No type ）
Usually applied to the return type of a function 、 The parameters of the function 、 Pointer types

3、 ... and . The storage of integers in memory

The creation of a variable is to open up space in memory . The size of space is determined according to different types .
Next, let's talk about how data is stored in the opened memory ？
such as ：
int a = 20;
int b = -20;
We know for a Allocate four bytes of space , How is it stored ？ Let's next understand

1. Original code 、 Inverse code 、 Complement code

There are three kinds of integers in the computer 2 Decimal representation , The original code 、 Inverse and complement .
The three representations have two parts: sign bit and numeric bit , The sign bits are all used 0 Express “ just ”, use 1 Express “ negative ”.
A positive number 、 back 、 The complement is the same .
The three representations of negative integers are different .

Original code ：
The original code can be obtained by directly translating the numerical value into binary in the form of positive and negative numbers .
Inverse code ：
Change the sign bit of the original code , The reverse code can be obtained by inverting other bits in turn .
Complement code ：
Inverse code +1 You get the complement .

For plastic surgery ： Data stored in memory is actually stored in the complement .

Why? ？
In computer system , All values are represented and stored by complements . The reason lies in , Use complement , Symbol bits and value fields can be treated in a unified way ;
meanwhile , Addition and subtraction can also be handled in a unified way （CPU Only adders ） Besides , Complement code and original code are converted to each other , Its operation process is the same , No need for additional hardware circuits ️
How to better understand ？
Let's give you an example ： If the computer is needed now 1-1 Arithmetic ？

Then let's look at the storage in memory

We can see , It is stored in memory in the form of complement , But we found that There's something wrong with the order

Why is that ？

2. Introduction of large and small end

1. What is big end, small end

Big end （ Storage ） Pattern , The low bit of data is stored in the high address of memory , And the high end of the data , Stored in a low address in memory ;

The small end （ Storage ） Pattern , The low bit of data is stored in the low address of memory , And the high end of the data ,, Stored in a high address in memory .

The memory of our computer is used from low address to high , Whenever we want to define a variable , It will request space from memory to store data . Suppose we want to store a number ,11223344（ Hexadecimal form ）
Because we know , Memory space is in bytes , We define a four byte hexadecimal number to simulate the storage in memory
Just like we usually write numbers （ Decimal system ） For example, write a 520 ：520️
We naturally put the high position （5） Write on the left , Therefore, the storage mode of the large and small end is the difference between the high position of a number in storage and the low address or the high address
Which storage mode our device adopts is determined by the hardware design of the device

2. practice ️

After understanding the size end , We passed together Baidu 2015 System Engineer written examination questions to deepen the mastery ：

Please briefly describe the concepts of big end byte order and small end byte order , Design a small program to determine the current machine byte order .（10 branch ）
In fact, this problem is not complicated , We already know , The difference between the storage mode of the large and small end is that the order of the binary sequence complement of the stored data is different . Then can we enter a test number （ The simpler the better ： such as 1） Then take out the sequence of the first byte when it is stored in memory . Then we can draw a conclusion through judgment .
️ Code display ：
int main()
{
	int a = 1;
	if (*(char*)&a == 1)
		printf(" Small end mode storage ");
	else
		printf(" Big end mode storage ");
	return 0;
}
We choose 1 As a test number, the reason is ,1 The complement of is ：00000000000000000000000000000001
Suppose it is small end mode storage , So let's take out a The first byte sequence stored in memory , It would be 1, Otherwise, store for big end mode .
It should be noted here that we have used some pointer knowledge
️ The type of pointer determines the permission of using pointer to access memory （ The unit that makes the operation access ）
Integer pointer dereference will access 4 Bytes , But we just want to visit 1 Bytes , Then we will convert it into char Type dereference .
It can be understood as : We are standing char To access memory , and char One byte is accessed at a time , Dereference means accessing a byte backwards from the first address

practice 2

Excuse me, ： What is the result of the above code ？
🥲 I didn't think! ？ Next, let me explain it in detail for you
First of all, we need to know a rule concept called ： Improve the overall shape
We know %d The return result is signed int int（ I don't know ） And the assumption needs to be implemented %d The data of this action a It's a char type , perhaps short type ... These data storage bytes are less than int 8 Bytes will happen Improve the overall shape
When integer upgrade occurs, you will see a The type of , If a Is an unsigned data type , Then directly complement the high order of its data complement 0, Make up for it 32 Bit bit bit . If a Is a signed data type , Then the highest bit of its complement will be directly regarded as the sign bit , Then if this sign bit is 1 Then the high position will be supplemented 1 To a total of 32 Bit bit bit , Empathy , If the sign bit is 0, Then fill 0
After knowing this important concept , The analysis of this problem is much simpler

practice 3

What the following code will output ？

int main()
{
	unsigned int i;
	for (i = 9; i >= 0; i--) 
	{
		printf("%u\n", i);
	}
}

️ Output results ：

The output of the above code is output first 9~0, Then it began. We were afraid to see it Dead cycle ！
Why is that ？ Now let me talk to you
First of all, we should pay attention to i The type is Unsigned integer , So it means i There is no sign bit ,32 Bits are significant bits , So we can Assertion ：i Must be greater than or equal to 0 Of ！
So the familiar results will appear at the beginning , Cycle print to 0.
But here is one thing we need to pay attention to , Like this program ,for After each cycle , Yes, it will be executed first i --, After execution, judge whether to enter the cycle .
So when i Turn into 0 after , Subtract one to get -1 But we know that , At this point, the program's i It's an unsigned integer , Is a constant positive number , So the problem arises here
When i Turn into -1 when , We can write it first -1 Binary complement sequence of
10000000 00000000 00000000 00000001 --> -1 The original code of 
11111111 11111111 11111111 11111110 --> -1 The inverse of 
11111111 11111111 11111111 11111111 --> -1 Complement 
Very special ,-1 All the complements are 1（ We can remember ）, And because of our i It's an unsigned integer , therefore i Look at his complement sequence at this time from the perspective of an unsigned number ,32 position 1, That's a big positive number So it's going to get stuck in a dead cycle

practice 4

What the following code will output ？

int main()
{
    char a[1000];
    int i;
    for(i=0; i<1000; i++)
   {
        a[i] = -1-i;
   }
    printf("%d",strlen(a));
    return 0; 
}

️ Output results ：

You who have practiced here , Practice makes perfect, and you can infer the correct answer ? Or are you used to unexpected output results and don't be surprised ？ Don't worry ！ Let me continue to analyze with you

We notice that the final thing is to output strlen The return value of the function , Then we should first understand
strlen function Is used to find the length of the string , And his principle is to calculate the string ‘\0’ How many characters before , So I will find ‘\0’ Just willing to give up . in addition ,‘\0’ It's also a character , His ASCII The code value is 0, therefore ‘\0’ It can also be equivalent to 0
After understanding, let's look at this problem
We can notice that , Array a The type is char, And it is a signed type , As we said above ,char The value range of the type is -128~127. And this code is from a[ 0 ] = -1,a[ 1 ] = -2 So it's going down , Wait until the array value is -128 What will happen ？ Now let's use the diagram to understand
We know ,char The type is eight bits , The eight bits can represent a total of 2^8 That is to say 256 individual . And here's the picture , Let's suppose we start from 0 Start , Every time +1 When we add and remove the sign bits, the rest 7 All significant bits are 1 when , That is, the largest positive number ：127 And when we +1 when , Its binary sequence will become 10000000. Be careful This is a special number , because 10000000 Find the inverse code ： The sign bits remain the same , The rest is reversed ：11111111, We will get 8 individual 1, And again +1 Ask for a complement, and there will be one more , Overflowed char Of memory space . therefore ,c The language prescribes char Type in the ,10000000 Express -128 Then when we continue +1, It will be like a clock , Walk in circles ,char The value of -128~127 Between them in sequence . in summary ,char The size range of type data is -128~127
After understanding, we return to the topic , Because the array type is char type , So the array in the title a The value of is -1 -2 -3 -4 ... -128 127 126 ...2 1 0 -1 -2....
Finally, because at the beginning we said ,strlen Function is calculation ‘\0’ The number of characters that appear before , So is 255 individual

practice 5

What is the output of the following code ？

#include <stdio.h>
unsigned char i = 0;
int main()
{
    for(i = 0;i<=255;i++)
   {
        printf("hello world\n");
   }
    return 0; }

If nothing happens, it will happen The output of this code is Dead cycle

️ If you have remembered the unsigned char The value range of the type is 0~255, And can keenly notice at the first time i It's unsigned char type , Then it should not be difficult to deduce the results .
Precisely because i yes 0~255, That is, constant less than or equal to 255, So the following for Isn't the cycle permanent ？ So it's a dead cycle ！

Okay , After several interesting and special topics , At this time, can you feel that you have deepened your grasp of data types and data storage rules ？ Through these questions, we should sum up ： In later programming , Be sure to think about unsigned data types and signed data types , Otherwise, if you are not careful, it is easy to have a dead cycle ！

Four . Floating point storage in memory

Common floating point

3.14159
1E10 （ Scientific enumeration , Express ：1.0*10 Of 10 Power ）
The family of floating-point numbers includes ： float、double、long double type .
The range represented by floating point numbers ：float.h In the definition of

1. An example

Before entering the introduction of floating-point memory storage , Let's do one problem first , If you can say the answer accurately now , That means you have roughly mastered the rules of floating-point memory storage .

ask ： What is the output of this program ？
wow , Is it a little unexpected ？
num and *pFloat It's the same number in memory , Why are the interpretation results of floating-point numbers and integers so different ？
To understand this result , Be sure to understand the representation of floating-point numbers in the computer

2. Floating point storage rules

According to international standards IEEE（ Institute of electrical and Electronic Engineering ） 754, Any binary floating point number V It can be expressed in the following form ：
(-1)^S * M * 2^E
(-1)^S The sign bit , When S=0,V Is a positive number ; When S=1,V It's a negative number .
M Represents a significant number , Greater than or equal to 1, Less than 2.
2^E Indicates the index bit .

for instance ：
Decimal 5.0, Written as binary is 101.0 , amount to 1.01×2^2 .
that , According to the above V The format of , We can draw S=0,M=1.01,E=2.
Decimal -5.0, Written as binary is -101.0 , amount to -1.01×2^2 . that ,S=1,M=1.01,E=2.
IEEE 754 Regulations ：
about 32 Floating point number of bits , The highest 1 Bits are sign bits s, And then 8 Bits are exponents E, The rest 23 Bits are significant numbers M.

about 64 Floating point number of bits , The highest 1 Bits are sign bits S, And then 11 Bits are exponents E, The rest 52 Bits are significant numbers M.

IEEE 754 For significant figures M And the index E, There are some special rules .

As I said before , 1≤M<2 , in other words ,M It can be written. 1.xxxxxx In the form of , among xxxxxx Represents the fractional part .
IEEE 754 Regulations , Keep it in the computer M when , By default, the first digit of this number is always 1, So it can be discarded , Save only the back
xxxxxx part . For example preservation 1.01 When the
Hou , Save only 01, Wait until you read , Put the first 1 Add . The purpose of this , It's saving 1 Significant digits . With 32 position
Floating point numbers, for example , Leave to M Only 23 position , Will come first 1 After giving up , It's equivalent to being able to save 24 Significant digits .

As for the index E, The situation is more complicated .

First ,E For an unsigned integer （unsigned int）
It means , If E by 8 position , Its value range is 0~255; If E by 11 position , Its value range is 0~2047. however , We know , In scientific counting E You can have negative numbers , therefore IEEE 754 Regulations , In memory E The true value of must be added with an intermediate number .

about 8 Bit E, The middle number is 127; about 11 Bit E, The middle number is 1023.

such as ,2^10 Of E yes 10, So save it as 32 When floating-point numbers are in place , Must be saved as 10+127=137, namely 10001001.

then , Index E Fetching from memory can be further divided into three cases ：

E Not all for 0 Or not all of them 1

At this time , Floating point numbers are represented by the following rules , The index E The calculated value of minus 127（ or 1023）, Get the real value , then
Significant figures M Add the first 1.
such as ：
0.5（1/2） The binary form of is 0.1, Since it is stipulated that the positive part must be 1, That is to move the decimal point to the right 1 position , Then for
1.0*2^(-1), Its order code is -1+127=126, Expressed as
01111110, And the mantissa 1.0 Remove the integer part and make it 0, A filling 0 To 23 position 00000000000000000000000, Then its binary representation is :
0 01111110 00000000000000000000000

E All for 0

At this time , The exponent of a floating point number E be equal to 1-127（ perhaps 1-1023） That's the true value ,
Significant figures M No more first 1, It's reduced to 0.xxxxxx Decimals of . This is to show that ±0, And close to 0 A very small number of .

E All for 1

At this time , If the significant number M All for 0, Express ± infinity （ It depends on the sign bit s）

Okay , On the representation rules of floating point numbers , That's it .

️ Explain the previous topic ：

below , Let's go back to the initial question ： Why? 0x00000009 Restore to floating point number , became 0.000000 ？
First , take 0x00000009 Split , Get the first sign bit s=0, Back 8 Bit index E=00000000 ,
Last 23 A significant number of bits M=000 0000 0000 0000 0000 1001.

9 -> 0000 0000 0000 0000 0000 0000 0000 1001

Because the index E All for 0, So the second case in the previous section . therefore , Floating point numbers V The just ：
　　 V=(-1)^0 × 0.00000000000000000001001×2^(-126)=1.001×2^(-146)
obviously ,V It's a very small one, close to 0 Positive number of , So the decimal number is 0.000000.

Explain through this part , In fact, we can know ： I If you save in floating-point numbers , It is necessary to take out in the form of floating-point numbers , Others in the same way .
When we take it out as a floating-point number , Just standing float From the perspective of , So you will think that this number is a floating point number , Therefore, the complement sequence is directly treated and processed according to the storage rules of floating-point numbers .

Let's look at the second part of the example .
Floating point number 9.0, How to express... In binary ？ How much is it to restore to decimal ？
First , Floating point numbers 9.0 Equal to binary 1001.0, namely 1.001×2^3.

9.0 -> 1001.0 ->(-1)^01.0012^3 -> s=0, M=1.001,E=3+127=130

that , The first sign bit s=0, Significant figures M be equal to 001 Add... To the back 20 individual 0, Cramming 23 position , Index E be equal to 3+127=130, namely 10000010.
therefore , Written in binary form , Should be s+E+M, namely

0 10000010 001 0000 0000 0000 0000 0000

This 32 The binary number of bits , Restore to decimal , It is 1091567616.

5、 ... and . summary

When we type the code , We are dealing with data all the time , So we should know him better , Get on well with him , It will not be easy to have a similar dead cycle bug punishment ~~
That's the end of today's sharing ！ If there is anything wrong, you are welcome to correct it ~
If you see it here, you might as well give it a third time ~

原网站

版权声明
本文为[East-sunrise]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060911021157.html