notes : After reading this article, you will be able to grasp the essence of the pointer
C The core knowledge of language is pointer , therefore , The theme of this article is 「 Pointer and memory model 」
When it comes to pointers , It's impossible to leave memory , There are two kinds of people who learn how to point , One is that they don't understand the memory model , The other is to understand .
What we don't know about the pointer is that “ A pointer is the address of a variable ” this sentence , I'm afraid to use the pointer , Especially various advanced operations .
And those who understand the memory model can use the pointer perfectly , Various byte Operate at will , Let people call 666.
One 、 The nature of memory
The essence of programming is to manipulate data , Data is stored in memory .
therefore , If you can better understand the memory model , as well as C How to manage memory , You can see how the program works , So that the programming ability to a higher level .
Don't really think it's empty talk , I didn't dare to use it all my freshman year C Programs written on thousands of lines are also very resistant to writing C.
Because once a thousand lines , A variety of inexplicable memory errors often occur , It happened by accident coredump...... And there's no way to investigate , There's no reason for it .
by comparison , At that time, I liked Java, stay Java Whatever you write in it, it won't happen , At most once in a while NullPointerException , It's also easier to investigate .
Until later, I had a deeper understanding of memory and pointers , I can use it slowly C Write thousands of lines of projects , It's rare to have memory problems again .( Too confident
「 The pointer stores the memory address of the variable 」 This sentence should be said in any way C Language books will mention it .
therefore , To understand the pointer thoroughly , First of all, understand C The storage nature of variables in languages , That's memory .
1.1 Memory addressing
The memory of a computer is a space for storing data , Consisting of a series of contiguous storage cells , It looks like this ,
Every cell represents 1 individual Bit, One bit stay EE Professional students seem to be high and low potential , And in the CS The students seem to be 0、1 Two kinds of state .
because 1 individual bit It can only represent two states , So the big guys stipulate 8 individual bit For a group , Name it byte.
And will byte As the smallest unit of memory addressing , It's for everyone byte A number , This number is called memory Address .
This is equivalent to , We give every unit in the neighborhood 、 Each household is assigned a house number : 301、302、403、404、501......
In the life , We need to make sure the number is unique , In this way, we can accurately locate the family through the house number .
Again , In the computer , We have to make sure that we give it to everyone byte The numbers are unique , Only in this way can we ensure that each number can access the unique and definite byte.
1.2 Memory address space
Above we said to give each in memory byte A unique number , Then the range of this number determines the range of addressable memory of the computer .
It's called a concatenated memory address , This is what we usually say about computers 32 A still 64 Bit related .
In the early Intel 8086、8088 Of CPU Just for 16 Bit address space , register and Address bus All are 16 position , This means that at most 2^16 = 64 Kb
Memory number addressing of .
This memory space is obviously not enough , later ,80286 stay 8086 On the basis of general Address bus and Address register Extended to 20 position , It's also called A20 Address bus .
I was writing mini os When , It also needs to go through BIOS Interrupt to start A20 Address bus switch .
however , Today's computers are generally 32 Bit's starting ,32 Bits means that the addressable memory range is 2^32 byte = 4GB
.
therefore , If your computer is 32 Bit , So you pretend to be more than 4G The memory module is also unable to make full use of .
Okay , This is memory and memory addressing .
1.3 The nature of variables
With memory , Next we need to think about ,int、double How these variables are stored in 0、1 Cell .
stay C In language, we define variables like this :
int a = 999;char c = 'c';
When you write down a variable definition , It's actually asking for a space in memory to hold your variables .
We all know int Type account 4 Bytes , And in the computer, the numbers are complemented ( Don't know the complement, remember to go to Baidu ) It means .
999
The complement is : 0000 0011 1110 0111
Here you are 4 individual byte, So you need four cells to store :
Have you noticed , We put the high byte in the low address .
Can it be reversed ?
Of course , This leads to Big end and small end .
The way to put the high byte in the low address of memory is called Big end
conversely , The way to put the low order byte in the low address of memory is called The small end :
It just says int How the variables of type 1 are stored in memory , and float、char The same is true of other types , They need to be converted to complements first .
For multibyte variable types , Also need to follow the format of big end or small end , Write the bytes to the memory unit in turn .
Remember the two pictures above , That's what all variables in programming languages look like in memory , Whether it's int、char、 The pointer 、 Array 、 Structure 、 object ... It's all in memory like this .
Two 、 What is a pointer ?
2.1 Where are the variables ?
I said above , To define a variable from the actual memory of the computer .
So if we want to know where the variables are ?
You can use the operator &
To get the actual address of the variable , This value is the starting address of the memory block occupied by the variable .
(PS: Actually, this address is a virtual address , It's not an address in real physical memory
We can print this address out :
printf("%x", &a);
It's going to be a bunch of numbers like this : 0x7ffcad3b8f3c
2.2 The nature of the pointer
Above said , We can go through &
Symbol gets the memory address of the variable , How to show that this is a Address , Instead of a normal value ?
That is to say C How to express the concept of address in language ?
Yes , It's the pointer , You can do this :
int *pa = &a;
pa The variables are stored in a
The address of , It's also called pointing a
The pointer to .
Here I want to talk about a few topics that seem a little boring :
Why do we need a pointer ? Can't you just use variable names ?
Certainly. , But variable names are limited .
What is the nature of variable names ?
It's the symbolization of the address of a variable , Variables are designed to make programming easier , Be friendly to people , But computers don't know any variables a
, It only knows the address and the instruction .
So when you look at C Language compiled assembly code , You'll find that the variable name disappears , Instead, it's a string of abstract addresses .
You can think , The compiler automatically maintains a map , Convert the variable name in our program to the address corresponding to the variable , Then read and write the address .
In other words, there is such a mapping table , Automatically convert variable name to address :
a | 0x7ffcad3b8f3c c | 0x7ffcad3b8f2c h | 0x7ffcad3b8f4c ....
Well said. !
But I still don't know the necessity of pointer , So here comes the question , Look at the code below :
int func(...) { ... };int main() { int a; func(...); };
Suppose I have a need :
Ask for in func
Function should be able to modify main
Variables in a function a
, How about this , stay main
Function can be read and written directly by variable name a
In memory .
But in func
Function is invisible a
Yeah .
You said you could go through &
Take the address symbol , take a
Pass in the address of :
int func(int address) { .... };int main() { int a; func(&a); };
In this way func
You can get a
The address of , Read and write .
In theory, there is no problem at all , But the problem is :
How does a compiler distinguish between a int What you have in it is int Type value , Or the address of another variable ( I.e. pointer ).
If it's entirely up to us programmers to remember , It introduces complexity , And it can't detect some syntax errors by compiler .
And by int *
To define a pointer variable , It will be very clear : This is the other one int The address of the variable of type .
The compiler can also eliminate some compilation errors by type checking .
This is the need for pointers .
In fact, any language has this need , It's just that a lot of languages are for security , Put a yoke on the pointer , Wrapping the pointer as a reference .
Maybe we all naturally accept the pointer when we study , But I hope this lengthy explanation will enlighten you .
meanwhile , Here's a little question :
Since the nature of pointers is the first memory address of a variable , That is, a int Type integer .
So why there are all kinds of types ?
such as int The pointer ,float The pointer , Does this type affect the information stored in the pointer itself ?
When will this type work ?
2.3 Quoting
The above question , It's to get the pointer dereference .
pa
Is stored in a
Memory address of variable , How to get it through the address a
The value of ?
This operation is called Quoting , stay C In language, through the operator *
You can get the content of the address indicated by the pointer .
such as *pa
Can get a
Value .
We say that the pointer stores the first address of the variable memory , How does the compiler know how many bytes to fetch from the first address ?
This is when pointer types work , The compiler will determine how many bytes should be fetched according to the type of the element to which the pointer refers .
If it is int Pointer to type , Then the compiler will produce instructions to extract four bytes ,char Only one byte is extracted , And so on .
Here is the pointer memory diagram :
pa
The pointer is a variable first , It also takes up a piece of memory , What's stored in this memory is a
The first address of the variable .
When dereferencing , It's going to be drawn from this initial address 4 individual byte, And then according to int How to encode a type .
2.4 Active learning and flexible application
Don't look at this place. It's simple , But it's the key to a deep understanding of the pointer .
Give two examples to illustrate :
such as :
float f = 1.0;short c = *(short*)&f;
You can explain the process , about f
Variable , Has anything changed at the memory level ?
perhaps c
What's the value of ?1 ?
actually , From the memory level , f
Nothing has changed .
Pictured :
Suppose this is f
Bit patterns in memory , This process is actually to put f
The first two of byte Take it out and follow short The way to explain , And then assign it to c
.
The detailed process is as follows :
-
&f
obtainf
The first address -
(short*)&f
The second step above did nothing , This expression just says :
“ Oh , In my submission f
This address is a short Variable of type ”
Finally, when it comes to dereferencing *(short*)&f
when , The compiler will take out the first two bytes , And in accordance with the short To explain the coding of , And assign the interpreted value to c
Variable .
This process f
There is no change in the bit pattern of , It's just the way these bits are interpreted .
Of course , The final value here is definitely not 1, As to what , You can go and really calculate .
That's the reverse , So what? ?
short c = 1;float f = *(float*)&c;
Pictured :
The specific process is the same as above , But there's no mistake on it , It's not necessarily here .
Why? ?
(float*)&c
It will take us from c
Start to take four bytes from the first address of , And then according to float To explain the coding of .
however c
yes short Type only takes two bytes , That will definitely access the next two bytes , At this point, memory access overrun occurs .
Of course , If it's just reading , Maybe the rate is OK .
however , Sometimes you need to write new values to this area , such as :
*(float*)&c = 1.0;
Then it could happen coredump, That is to say, the access to deposit failed .
in addition , Even if it doesn't coredump, This will also destroy the original value of this memory , Because it's very likely that this is the memory space of other variables , And we went to cover other people's content , It's bound to lead to hidden bug.
If you understand the above , Then the pointer will be more comfortable .
2.6 Look at a little problem
Here we are. , Let's look at a problem , This is a group friend asked , It's his need :
This is the code he wrote :
He put the double Write it in a file and read it out , And then found that the printed value does not match .
And here's the point :
char buffer[4]; ...printf("%f %x\n", *buffer, *buffer);
He might think buffer
It's a pointer ( Array, to be exact ), To dereference a pointer, you should get the value in it , And the value in it was read from the file 4 individual byte, That's what happened before float Variable .
Be careful , All this is what he thinks , In fact, the compiler thinks :
“ Oh , buffer
yes char Pointer to type , I'll just take the first byte ”.
Then pass the value of the first byte to printf function ,printf Function will find , %f
The request received is a float Floating point numbers , That will automatically convert the value of the first byte into a floating-point number and print it out .
That's the whole process .
The key to error is , The student mistook , Any pointer dereference is taken inside “ The value we think is ”, In fact, the compiler doesn't know , The compiler will only interpret it according to the type of pointer .
So here it's changed to :
printf("%f %x\n", *(float*)buffer, *(float*)buffer);
It is equivalent to telling the compiler explicitly that :
“ buffer
To this place , I put a float, You give me according to float To explain ”
3、 ... and 、 Structures and pointers
The structure contains multiple members , How are these members stored in memory ?
such as :
struct fraction { int num; // Integral part int denom; // The fractional part };struct fraction fp; fp.num = 10; fp.denom = 2;
This is a fixed-point decimal structure , It takes up in memory 8 Bytes ( Memory alignment is not considered here ), Two member domains are stored like this :
We put 10 In the structure, the base address offset is 0 The domain of ,2 Put it at an offset of 4 The domain of .
And then we do an operation that normal people never do :
((fraction*)(&fp.denom))->num = 5; ((fraction*)(&fp.denom))->denom = 12; printf("%d\n", fp.denom); // How much output ?
How much will the above one output ? Think about it yourself first ~
Next, I'll analyze what happened to this process :
First , &fp.denom
It means to take structure fp in denom The first address of the domain , Then take this address as the starting address 8 Bytes , And think of them as a fraction Structure .
In this new structure , The top four bytes become denom Domain , and fp Of denom The domain is equivalent to the new structure num Domain .
therefore :
((fraction*)(&fp.denom))->num = 5
What actually changed was fp.denom
, and
((fraction*)(&fp.denom))->denom = 12
The top four bytes are assigned as 12.
Of course , Write values to those four bytes of memory , The results are unpredictable , May cause program crash , Because maybe it's where the key information of the function call stack frame is stored , Maybe there's no write permission there .
We are just beginning to learn C A lot of language coredump Mistakes are caused by similar reasons .
So the final output is 5.
Why talk about this kind of code that doesn't make sense ?
It's just to show that the nature of a structure is actually a bunch of variables packed together , And access the domain in the structure , It's through the starting address of the structure , Also called base address , And then add the offset of the field .
Actually ,C++、Java Objects in are also stored in this way , It's just that they're trying to implement some object-oriented features , It will be outside the data members , Add some Head Information , such as C++ Virtual function table of .
actually , We can use C Language to imitate .
That's why we keep saying C Language is the foundation , You really understand C Pointer and memory , For other languages, you can quickly understand the object model and memory layout .
Four 、 Multi level pointer
Talking about the multi-level pointer , I used to be a freshman , At most, I can understand 2 level , More will really make me dizzy , I often write wrong code .
If you write me this : int ******p
It can break me down , I think many students are in this situation now
Actually , Multi level pointers are not that complicated , It's the pointer, the pointer, the pointer ...... It's simple .
Today, let's take you to understand the essence of multi-level pointer .
First , I have a word to say , There's no multi-level pointer , The pointer is the pointer , Multi level pointer is just a logical concept for our convenience .
First of all, take a look at the express cabinet in life :
Everyone has used it , Fengchao or supermarket lockers are like this , Each grid has a number , We just need to get the number , And then we can find the corresponding lattice , Take out the contents .
The grid here is the memory unit , The number is the address , The contents in the grid correspond to the contents stored in memory .
Suppose I put a Book , On the 03 No. 1 grid , And then put 03 This number tells you , So you can use 03 Go and get the book inside .
Then if I put the book in 05 No. 1 grid , And then in 03 There's only a little note in the box , It reads :「 The book is placed in 05 Number 」.
What would you do ?
Open, of course 03 No. 1 grid , And then he took out the note , Open according to the above 05 The number box gets the book .
there 03 The number one grid is called the pointer , Because it's filled with little pieces of paper pointing to other squares ( Address ) Not specific books .
See? ?
Then if I put the book in 07 No. 1 grid , And then in 05 No. 1 grid Put a note :「 The book is placed in 07 Number 」, At the same time 03 Put a note in the box 「 The book is placed in 05 Number 」
there 03 No. 2 grid is called secondary pointer ,05 The number one grid is called the pointer , and 07 The sign is the variable we usually use .
successively , Can be like the launch of N Level pointer .
So do you understand ? The same piece of memory , If you store the address of another variable , Then it's called the pointer , It's the actual content , It's called variable. .
int a;int *pa = &a;int **ppa = &pa;int ***pppa = &ppa;
The above code , pa
It's called a pointer , That is to say, the pointer , ppa
It's a secondary pointer .
The memory diagram is as follows :
No matter how many levels of the pointer, there are two core things :
-
The pointer itself is a variable , Need memory to store , The pointer also has its own address
-
Pointer memory stores the address of the variable it points to
That's why I think multilevel pointers are a logical concept , In fact, a piece of memory can hold the actual contents , Or put other variable addresses , It's that simple .
How to interpret int **a
This kind of expression ?
int ** a
It can be divided into two parts , namely int*
and *a
, Back *a
Medium *
Express a
Is a pointer variable , Ahead int*
Represents a pointer variable a
Only store int*
The address of the variable of type .
For two-level or even multi-level pointers , We can all split it in two .
First of all, no matter how many levels of pointer variables , It's a pointer variable first of all , A pointer variable is just a *
, The rest *
It indicates the address of what type of variable the pointer variable can only store .
such as int****a
Represents a pointer variable a
Only store int***
The address of the variable of type .
5、 ... and 、 Pointers and arrays
5.1 One dimensional array
An array is C Its own basic data structure , A thorough understanding of arrays and their usage is the basis for developing efficient applications .
Arrays and pointer representations are closely related , It can be interchanged in the right context .
as follows :
int array[10] = {10, 9, 8, 7};printf("%d\n", *array); // Output 10printf("%d\n", array[0]); // Output 10printf("%d\n", array[1]); // Output 9printf("%d\n", *(array+1)); // Output 9int *pa = array;printf("%d\n", *pa); // Output 10printf("%d\n", pa[0]); // Output 10printf("%d\n", pa[1]); // Output 9printf("%d\n", *(pa+1)); // Output 9
In memory , An array is a contiguous block of memory :
The first 0 The address of an element is called the first address of the array , The array name actually points to the first address of the array , When we go through array[1]
perhaps *(array + 1)
To access array elements .
In fact, it can be seen as address[offset]
, address
Is the starting address , offset
For offset , But notice the offset here offset
Not directly with address
Add up , It's multiplied by the number of bytes the array type occupies , That is to say : address + sizeof(int) * offset
.
Students who have studied assembly , Must be familiar with this way , This is one of the addressing methods in assembly : Base addressing .
After reading the code above , Many students may think that pointer and array are exactly the same , interchangeable , This is totally wrong .
Although array names can sometimes be used as pointers , But the name of the array is not a pointer .
The most typical place is in sizeof:
printf("%u", sizeof(array));printf("%u", sizeof(pa));
The first one will output 40, because array
contains 10 individual int Element of type , And the second one is 32 A bit machine will output 4, That's the length of the pointer .
Why is this so ?
From the compiler's point of view , Variable name 、 Array names are symbols , They all have types , They all end up with data binding .
Variable names are used to refer to a piece of data , Array names are used to refer to a set of data ( Data set ), They all have types , In order to infer the length of the data referred to .
Yes , There are also types of arrays , We can int、float、char And so on , An array is understood as a slightly more complex type derived from a basic type ,
The type of an array consists of the type of the element and the length of the array . and sizeof
The length is calculated according to the type of the variable , And it's during the compilation process , Not when the program is running .
During the compilation process, the compiler will create a special table to save the variable name and its corresponding data type 、 Address 、 Scope, etc .
sizeof
Is an operator , It's not a function , Use sizeof
The length of the symbol can be found from this table .
therefore , Here we use... For array names sizeof
You can query the actual length of the array .
pa
It's just a point int Pointer to type , The compiler doesn't know at all that it points to an integer , It's still a bunch of integers .
Although here it points to an array , But an array is just a piece of contiguous memory , There are no start and end marks , There's no extra information about how long the array is .
So for pa
Use sizeof
Only the length of the pointer variable itself can be obtained .
in other words , The compiler didn't put pa
Associated with arrays , pa
It's just a pointer variable , No matter where it points , sizeof
The result is always the number of bytes it occupies .
5.2 Two dimensional array
Don't think that a two-dimensional array in memory is a row by row 、 This is a two-dimensional storage , actually , Regardless of two dimensions 、 Three dimensional array ... It's the syntax sugar of the compiler .
There is no essential difference between storage and one-dimensional array , for instance :
int array[3][3] = {{1, 2,3}, {4, 5,6},{7, 8, 9}}; array[1][1] = 5;
Maybe you think it's in memory array
The array will look like a two-dimensional matrix :
But actually it's like this :
It's no different from a one-dimensional array , They're all one-dimensional linear arrays .
When we look like array[1][1]
When you visit like this , How does the compiler calculate the address of the element we are actually accessing ?
In order to be more general , Suppose the array definition looks like this :
int array[n][m]
visit : array[a][b]
Then the calculation of the address of the visited element is : array + (m * a + b)
This is the essence of two-dimensional arrays in memory , It's the same as a one-dimensional array , It's just grammar sugar packaged in a two-dimensional shape .
6、 ... and 、 magical void The pointer
You must have seen void These usages of :
void func();int func1(void);
In these cases ,void There is no return value or the parameter is null .
But for void The type pointer represents a general pointer , It can be used to store references of any data type .
The following example is It's a void The pointer :
void *ptr;
void The greatest use of pointers is in C Implementation of generic programming in language , Because any pointer can be assigned to void The pointer ,void Pointers can also be converted back to the original pointer type , And the address the process pointer actually points to doesn't change .
such as :
int num;int *pi = # printf("address of pi: %p\n", pi); void* pv = pi; pi = (int*) pv; printf("address of pi: %p\n", pi);
Both times the output value will be the same :
In general, it's very rare to switch like this , But when you use C When writing large software or some general library , It must be inseparable from void The pointer , This is a C The cornerstone of generics , such as std In the database sort The function statement goes like this :
void qsort(void *base,int nelem,int width,int (*fcmp)(const void *,const void *));
All the places about the specific element types use void Instead of .
void It can also be used to implement C Polymorphism in language , It's a funny thing .
But there's something to be aware of :
-
Not right void Pointer dereference
such as :
int num;void *pv = (void*)# *pv = 4; // error
Why? ?
Because the essence of dereference is that the compiler depends on the type of pointer , Then from the memory pointed to by the pointer N Bytes , And then N Bytes are interpreted according to the type of pointer .
such as int * Type pointer , So here N Namely 4, And then according to int To explain the numbers .
however void, The compiler doesn't know what it's pointing to int、double、 Or a structure , So the compiler can't handle void Type pointer dereference .
7、 ... and 、 Fancy show
Many students think C It's just process oriented programming , Actually, with pointers , We can do the same in C To simulate the object 、 Inherit 、 Polymorphism and so on .
You can also use void Pointers implement generic programming , That is to say Java、C++ In the template .
If you are right C Implement object-oriented 、 Templates 、 Inherit the words of interest , You can be positive , give the thumbs-up , Leaving a message. ~
In addition, if you want to better improve your programming ability , Learn from good examples C Language C++ Programming ! Overtaking in curve , One step at a time !
C Language C++ Programming learning communication circle ,QQ Group 1030652847【 Click to enter 】 WeChat official account :C Language programming learning base
Share ( Source code 、 Project practice video 、 Project notes , Introduction to Basics )
Welcome to change careers and learn programming partners , Using more information to learn and grow faster than thinking about it yourself !
Programming learning book sharing :
Programming learning video sharing :