当前位置：网站首页>C / C + + Programming Notes: pointer! Understand pointer from memory, let you understand pointer completely

C / C + + Programming Notes: pointer! Understand pointer from memory, let you understand pointer completely

2020-11-08 23:46:00 【112Q】

notes ： After reading this article, you will be able to grasp the essence of the pointer

C The core knowledge of language is pointer , therefore , The theme of this article is 「 Pointer and memory model 」

When it comes to pointers , It's impossible to leave memory , There are two kinds of people who learn how to point , One is that they don't understand the memory model , The other is to understand .

What we don't know about the pointer is that “ A pointer is the address of a variable ” this sentence , I'm afraid to use the pointer , Especially various advanced operations .

And those who understand the memory model can use the pointer perfectly , Various byte Operate at will , Let people call 666.

One 、 The nature of memory

The essence of programming is to manipulate data , Data is stored in memory .

therefore , If you can better understand the memory model , as well as C How to manage memory , You can see how the program works , So that the programming ability to a higher level .

Don't really think it's empty talk , I didn't dare to use it all my freshman year C Programs written on thousands of lines are also very resistant to writing C.

Because once a thousand lines , A variety of inexplicable memory errors often occur , It happened by accident coredump...... And there's no way to investigate , There's no reason for it .

by comparison , At that time, I liked Java, stay Java Whatever you write in it, it won't happen , At most once in a while NullPointerException , It's also easier to investigate .

Until later, I had a deeper understanding of memory and pointers , I can use it slowly C Write thousands of lines of projects , It's rare to have memory problems again .（ Too confident

「 The pointer stores the memory address of the variable 」 This sentence should be said in any way C Language books will mention it .

therefore , To understand the pointer thoroughly , First of all, understand C The storage nature of variables in languages , That's memory .

1.1 Memory addressing

The memory of a computer is a space for storing data , Consisting of a series of contiguous storage cells , It looks like this ,

Every cell represents 1 individual Bit, One bit stay EE Professional students seem to be high and low potential , And in the CS The students seem to be 0、1 Two kinds of state .

because 1 individual bit It can only represent two states , So the big guys stipulate 8 individual bit For a group , Name it byte.

And will byte As the smallest unit of memory addressing , It's for everyone byte A number , This number is called memory Address .

This is equivalent to , We give every unit in the neighborhood 、 Each household is assigned a house number ： 301、302、403、404、501......

In the life , We need to make sure the number is unique , In this way, we can accurately locate the family through the house number .

Again , In the computer , We have to make sure that we give it to everyone byte The numbers are unique , Only in this way can we ensure that each number can access the unique and definite byte.

1.2 Memory address space

Above we said to give each in memory byte A unique number , Then the range of this number determines the range of addressable memory of the computer .

It's called a concatenated memory address , This is what we usually say about computers 32 A still 64 Bit related .

In the early Intel 8086、8088 Of CPU Just for 16 Bit address space , register and Address bus All are 16 position , This means that at most 2^16 = 64 Kb Memory number addressing of .

This memory space is obviously not enough , later ,80286 stay 8086 On the basis of general Address bus and Address register Extended to 20 position , It's also called A20 Address bus .

I was writing mini os When , It also needs to go through BIOS Interrupt to start A20 Address bus switch .

however , Today's computers are generally 32 Bit's starting ,32 Bits means that the addressable memory range is 2^32 byte = 4GB .

therefore , If your computer is 32 Bit , So you pretend to be more than 4G The memory module is also unable to make full use of .

Okay , This is memory and memory addressing .

1.3 The nature of variables

With memory , Next we need to think about ,int、double How these variables are stored in 0、1 Cell .

stay C In language, we define variables like this ：

int a = 999;char c = 'c';

When you write down a variable definition , It's actually asking for a space in memory to hold your variables .

We all know int Type account 4 Bytes , And in the computer, the numbers are complemented （ Don't know the complement, remember to go to Baidu ） It means .

999 The complement is ： 0000 0011 1110 0111

Here you are 4 individual byte, So you need four cells to store ：

Have you noticed , We put the high byte in the low address .

Can it be reversed ？

Of course , This leads to Big end and small end .

The way to put the high byte in the low address of memory is called Big end

conversely , The way to put the low order byte in the low address of memory is called The small end ：

It just says int How the variables of type 1 are stored in memory , and float、char The same is true of other types , They need to be converted to complements first .

For multibyte variable types , Also need to follow the format of big end or small end , Write the bytes to the memory unit in turn .

Remember the two pictures above , That's what all variables in programming languages look like in memory , Whether it's int、char、 The pointer 、 Array 、 Structure 、 object ... It's all in memory like this .

Two 、 What is a pointer ？

2.1 Where are the variables ？

I said above , To define a variable from the actual memory of the computer .

So if we want to know where the variables are ？

You can use the operator & To get the actual address of the variable , This value is the starting address of the memory block occupied by the variable .

(PS: Actually, this address is a virtual address , It's not an address in real physical memory

We can print this address out :

printf("%x", &a);

It's going to be a bunch of numbers like this : 0x7ffcad3b8f3c

2.2 The nature of the pointer

Above said , We can go through & Symbol gets the memory address of the variable , How to show that this is a Address , Instead of a normal value ？

That is to say C How to express the concept of address in language ？

Yes , It's the pointer , You can do this :

int *pa = &a;

pa The variables are stored in a The address of , It's also called pointing a The pointer to .

Here I want to talk about a few topics that seem a little boring ：

Why do we need a pointer ？ Can't you just use variable names ？

Certainly. , But variable names are limited .

What is the nature of variable names ？

It's the symbolization of the address of a variable , Variables are designed to make programming easier , Be friendly to people , But computers don't know any variables a , It only knows the address and the instruction .

So when you look at C Language compiled assembly code , You'll find that the variable name disappears , Instead, it's a string of abstract addresses .

You can think , The compiler automatically maintains a map , Convert the variable name in our program to the address corresponding to the variable , Then read and write the address .

In other words, there is such a mapping table , Automatically convert variable name to address ：

a | 0x7ffcad3b8f3c c | 0x7ffcad3b8f2c h | 0x7ffcad3b8f4c ....

Well said. ！

But I still don't know the necessity of pointer , So here comes the question , Look at the code below :

int func(...) { ... };int main() { int a; func(...); };

Suppose I have a need ：

Ask for in func Function should be able to modify main Variables in a function a , How about this , stay main Function can be read and written directly by variable name a In memory .

But in func Function is invisible a Yeah .

You said you could go through & Take the address symbol , take a Pass in the address of ：

int func(int address) { .... };int main() { int a; func(&a); };

In this way func You can get a The address of , Read and write .

In theory, there is no problem at all , But the problem is :

How does a compiler distinguish between a int What you have in it is int Type value , Or the address of another variable （ I.e. pointer ）.

If it's entirely up to us programmers to remember , It introduces complexity , And it can't detect some syntax errors by compiler .

And by int * To define a pointer variable , It will be very clear ： This is the other one int The address of the variable of type .

The compiler can also eliminate some compilation errors by type checking .

This is the need for pointers .

In fact, any language has this need , It's just that a lot of languages are for security , Put a yoke on the pointer , Wrapping the pointer as a reference .

Maybe we all naturally accept the pointer when we study , But I hope this lengthy explanation will enlighten you .

meanwhile , Here's a little question ：

Since the nature of pointers is the first memory address of a variable , That is, a int Type integer .

So why there are all kinds of types ？

such as int The pointer ,float The pointer , Does this type affect the information stored in the pointer itself ？

When will this type work ？

2.3 Quoting

The above question , It's to get the pointer dereference .

pa Is stored in a Memory address of variable , How to get it through the address a The value of ？

This operation is called Quoting , stay C In language, through the operator * You can get the content of the address indicated by the pointer .

such as *pa Can get a Value .

We say that the pointer stores the first address of the variable memory , How does the compiler know how many bytes to fetch from the first address ？

This is when pointer types work , The compiler will determine how many bytes should be fetched according to the type of the element to which the pointer refers .

If it is int Pointer to type , Then the compiler will produce instructions to extract four bytes ,char Only one byte is extracted , And so on .

Here is the pointer memory diagram :

pa The pointer is a variable first , It also takes up a piece of memory , What's stored in this memory is a The first address of the variable .

When dereferencing , It's going to be drawn from this initial address 4 individual byte, And then according to int How to encode a type .

2.4 Active learning and flexible application

Don't look at this place. It's simple , But it's the key to a deep understanding of the pointer .

Give two examples to illustrate ：

such as ：

float f = 1.0;short c = *(short*)&f;

You can explain the process , about f Variable , Has anything changed at the memory level ？

perhaps c What's the value of ？1 ？

actually , From the memory level , f Nothing has changed .

Pictured :

Suppose this is f Bit patterns in memory , This process is actually to put f The first two of byte Take it out and follow short The way to explain , And then assign it to c .

The detailed process is as follows ：

&f obtain f The first address
(short*)&f

The second step above did nothing , This expression just says ：

“ Oh , In my submission f This address is a short Variable of type ”

Finally, when it comes to dereferencing *(short*)&f when , The compiler will take out the first two bytes , And in accordance with the short To explain the coding of , And assign the interpreted value to c Variable .

This process f There is no change in the bit pattern of , It's just the way these bits are interpreted .

Of course , The final value here is definitely not 1, As to what , You can go and really calculate .

That's the reverse , So what? ？

short c = 1;float f = *(float*)&c;

Pictured ：

The specific process is the same as above , But there's no mistake on it , It's not necessarily here .

Why? ？

(float*)&c It will take us from c Start to take four bytes from the first address of , And then according to float To explain the coding of .

however c yes short Type only takes two bytes , That will definitely access the next two bytes , At this point, memory access overrun occurs .

Of course , If it's just reading , Maybe the rate is OK .

however , Sometimes you need to write new values to this area , such as ：

*(float*)&c = 1.0;

Then it could happen coredump, That is to say, the access to deposit failed .

in addition , Even if it doesn't coredump, This will also destroy the original value of this memory , Because it's very likely that this is the memory space of other variables , And we went to cover other people's content , It's bound to lead to hidden bug.

If you understand the above , Then the pointer will be more comfortable .

2.6 Look at a little problem

Here we are. , Let's look at a problem , This is a group friend asked , It's his need ：

This is the code he wrote ：

He put the double Write it in a file and read it out , And then found that the printed value does not match .

And here's the point :

char buffer[4]; ...printf("%f %x\n", *buffer, *buffer);

He might think buffer It's a pointer （ Array, to be exact ）, To dereference a pointer, you should get the value in it , And the value in it was read from the file 4 individual byte, That's what happened before float Variable .

Be careful , All this is what he thinks , In fact, the compiler thinks ：

“ Oh , buffer yes char Pointer to type , I'll just take the first byte ”.

Then pass the value of the first byte to printf function ,printf Function will find , %f The request received is a float Floating point numbers , That will automatically convert the value of the first byte into a floating-point number and print it out .

That's the whole process .

The key to error is , The student mistook , Any pointer dereference is taken inside “ The value we think is ”, In fact, the compiler doesn't know , The compiler will only interpret it according to the type of pointer .

So here it's changed to ：

printf("%f %x\n", *(float*)buffer, *(float*)buffer);

It is equivalent to telling the compiler explicitly that ：

“ buffer To this place , I put a float, You give me according to float To explain ”

3、 ... and 、 Structures and pointers

The structure contains multiple members , How are these members stored in memory ？

such as ：

struct fraction { int num; // Integral part int denom; // The fractional part };struct fraction fp; fp.num = 10; fp.denom = 2;

This is a fixed-point decimal structure , It takes up in memory 8 Bytes （ Memory alignment is not considered here ）, Two member domains are stored like this ：

We put 10 In the structure, the base address offset is 0 The domain of ,2 Put it at an offset of 4 The domain of .

And then we do an operation that normal people never do ：

((fraction*)(&fp.denom))->num = 5; ((fraction*)(&fp.denom))->denom = 12; printf("%d\n", fp.denom); // How much output ？

How much will the above one output ？ Think about it yourself first ~

Next, I'll analyze what happened to this process ：

First , &fp.denom It means to take structure fp in denom The first address of the domain , Then take this address as the starting address 8 Bytes , And think of them as a fraction Structure .

In this new structure , The top four bytes become denom Domain , and fp Of denom The domain is equivalent to the new structure num Domain .

therefore ：

((fraction*)(&fp.denom))->num = 5

What actually changed was fp.denom , and

((fraction*)(&fp.denom))->denom = 12

The top four bytes are assigned as 12.

Of course , Write values to those four bytes of memory , The results are unpredictable , May cause program crash , Because maybe it's where the key information of the function call stack frame is stored , Maybe there's no write permission there .

We are just beginning to learn C A lot of language coredump Mistakes are caused by similar reasons .

So the final output is 5.

Why talk about this kind of code that doesn't make sense ？

It's just to show that the nature of a structure is actually a bunch of variables packed together , And access the domain in the structure , It's through the starting address of the structure , Also called base address , And then add the offset of the field .

Actually ,C++、Java Objects in are also stored in this way , It's just that they're trying to implement some object-oriented features , It will be outside the data members , Add some Head Information , such as C++ Virtual function table of .

actually , We can use C Language to imitate .

That's why we keep saying C Language is the foundation , You really understand C Pointer and memory , For other languages, you can quickly understand the object model and memory layout .

Four 、 Multi level pointer

Talking about the multi-level pointer , I used to be a freshman , At most, I can understand 2 level , More will really make me dizzy , I often write wrong code .

If you write me this ： int ******p It can break me down , I think many students are in this situation now

Actually , Multi level pointers are not that complicated , It's the pointer, the pointer, the pointer ...... It's simple .

Today, let's take you to understand the essence of multi-level pointer .

First , I have a word to say , There's no multi-level pointer , The pointer is the pointer , Multi level pointer is just a logical concept for our convenience .

First of all, take a look at the express cabinet in life ：

Everyone has used it , Fengchao or supermarket lockers are like this , Each grid has a number , We just need to get the number , And then we can find the corresponding lattice , Take out the contents .

The grid here is the memory unit , The number is the address , The contents in the grid correspond to the contents stored in memory .

Suppose I put a Book , On the 03 No. 1 grid , And then put 03 This number tells you , So you can use 03 Go and get the book inside .

Then if I put the book in 05 No. 1 grid , And then in 03 There's only a little note in the box , It reads ：「 The book is placed in 05 Number 」.

What would you do ？

Open, of course 03 No. 1 grid , And then he took out the note , Open according to the above 05 The number box gets the book .

there 03 The number one grid is called the pointer , Because it's filled with little pieces of paper pointing to other squares （ Address ） Not specific books .

See? ？

Then if I put the book in 07 No. 1 grid , And then in 05 No. 1 grid Put a note ：「 The book is placed in 07 Number 」, At the same time 03 Put a note in the box 「 The book is placed in 05 Number 」

there 03 No. 2 grid is called secondary pointer ,05 The number one grid is called the pointer , and 07 The sign is the variable we usually use .

successively , Can be like the launch of N Level pointer .

So do you understand ？ The same piece of memory , If you store the address of another variable , Then it's called the pointer , It's the actual content , It's called variable. .

int a;int *pa = &a;int **ppa = &pa;int ***pppa = &ppa;

The above code , pa It's called a pointer , That is to say, the pointer , ppa It's a secondary pointer .

The memory diagram is as follows :

No matter how many levels of the pointer, there are two core things ：

The pointer itself is a variable , Need memory to store , The pointer also has its own address
Pointer memory stores the address of the variable it points to

That's why I think multilevel pointers are a logical concept , In fact, a piece of memory can hold the actual contents , Or put other variable addresses , It's that simple .

How to interpret int **a This kind of expression ？

int ** a It can be divided into two parts , namely int* and *a , Back *a Medium * Express a Is a pointer variable , Ahead int* Represents a pointer variable a

Only store int* The address of the variable of type .

For two-level or even multi-level pointers , We can all split it in two .

First of all, no matter how many levels of pointer variables , It's a pointer variable first of all , A pointer variable is just a * , The rest * It indicates the address of what type of variable the pointer variable can only store .

such as int****a Represents a pointer variable a Only store int*** The address of the variable of type .

5、 ... and 、 Pointers and arrays

5.1 One dimensional array

An array is C Its own basic data structure , A thorough understanding of arrays and their usage is the basis for developing efficient applications .

Arrays and pointer representations are closely related , It can be interchanged in the right context .

as follows ：

int array[10] = {10, 9, 8, 7};printf("%d\n", *array); // Output 10printf("%d\n", array[0]); // Output 10printf("%d\n", array[1]); // Output 9printf("%d\n", *(array+1)); // Output 9int *pa = array;printf("%d\n", *pa); // Output 10printf("%d\n", pa[0]); // Output 10printf("%d\n", pa[1]); // Output 9printf("%d\n", *(pa+1)); // Output 9

In memory , An array is a contiguous block of memory ：

The first 0 The address of an element is called the first address of the array , The array name actually points to the first address of the array , When we go through array[1] perhaps *(array + 1) To access array elements .

In fact, it can be seen as address[offset] , address Is the starting address , offset For offset , But notice the offset here offset Not directly with address Add up , It's multiplied by the number of bytes the array type occupies , That is to say ： address + sizeof(int) * offset .

Students who have studied assembly , Must be familiar with this way , This is one of the addressing methods in assembly ： Base addressing .

After reading the code above , Many students may think that pointer and array are exactly the same , interchangeable , This is totally wrong .

Although array names can sometimes be used as pointers , But the name of the array is not a pointer .

The most typical place is in sizeof:

printf("%u", sizeof(array));printf("%u", sizeof(pa));

The first one will output 40, because array contains 10 individual int Element of type , And the second one is 32 A bit machine will output 4, That's the length of the pointer .

Why is this so ？

From the compiler's point of view , Variable name 、 Array names are symbols , They all have types , They all end up with data binding .

Variable names are used to refer to a piece of data , Array names are used to refer to a set of data （ Data set ）, They all have types , In order to infer the length of the data referred to .

Yes , There are also types of arrays , We can int、float、char And so on , An array is understood as a slightly more complex type derived from a basic type ,

The type of an array consists of the type of the element and the length of the array . and sizeof The length is calculated according to the type of the variable , And it's during the compilation process , Not when the program is running .

During the compilation process, the compiler will create a special table to save the variable name and its corresponding data type 、 Address 、 Scope, etc .

sizeof Is an operator , It's not a function , Use sizeof The length of the symbol can be found from this table .

therefore , Here we use... For array names sizeof You can query the actual length of the array .

pa It's just a point int Pointer to type , The compiler doesn't know at all that it points to an integer , It's still a bunch of integers .

Although here it points to an array , But an array is just a piece of contiguous memory , There are no start and end marks , There's no extra information about how long the array is .

So for pa Use sizeof Only the length of the pointer variable itself can be obtained .

in other words , The compiler didn't put pa Associated with arrays , pa It's just a pointer variable , No matter where it points , sizeof The result is always the number of bytes it occupies .

5.2 Two dimensional array

Don't think that a two-dimensional array in memory is a row by row 、 This is a two-dimensional storage , actually , Regardless of two dimensions 、 Three dimensional array ... It's the syntax sugar of the compiler .

There is no essential difference between storage and one-dimensional array , for instance ：

int array[3][3] = {{1, 2,3}, {4, 5,6},{7, 8, 9}}; array[1][1] = 5;

Maybe you think it's in memory array The array will look like a two-dimensional matrix :

But actually it's like this ：

It's no different from a one-dimensional array , They're all one-dimensional linear arrays .

When we look like array[1][1] When you visit like this , How does the compiler calculate the address of the element we are actually accessing ？

In order to be more general , Suppose the array definition looks like this :

int array[n][m]

visit : array[a][b]

Then the calculation of the address of the visited element is : array + (m * a + b)

This is the essence of two-dimensional arrays in memory , It's the same as a one-dimensional array , It's just grammar sugar packaged in a two-dimensional shape .

6、 ... and 、 magical void The pointer

You must have seen void These usages of ：

void func();int func1(void);

In these cases ,void There is no return value or the parameter is null .

But for void The type pointer represents a general pointer , It can be used to store references of any data type .

The following example is It's a void The pointer ：

void *ptr;

void The greatest use of pointers is in C Implementation of generic programming in language , Because any pointer can be assigned to void The pointer ,void Pointers can also be converted back to the original pointer type , And the address the process pointer actually points to doesn't change .

such as :

int num;int *pi = # printf("address of pi: %p\n", pi); void* pv = pi; pi = (int*) pv; printf("address of pi: %p\n", pi);

Both times the output value will be the same :

In general, it's very rare to switch like this , But when you use C When writing large software or some general library , It must be inseparable from void The pointer , This is a C The cornerstone of generics , such as std In the database sort The function statement goes like this :

void qsort(void *base,int nelem,int width,int (*fcmp)(const void *,const void *));

All the places about the specific element types use void Instead of .

void It can also be used to implement C Polymorphism in language , It's a funny thing .

But there's something to be aware of :

Not right void Pointer dereference

such as ：

int num;void *pv = (void*)# *pv = 4; // error

Why? ？

Because the essence of dereference is that the compiler depends on the type of pointer , Then from the memory pointed to by the pointer N Bytes , And then N Bytes are interpreted according to the type of pointer .

such as int * Type pointer , So here N Namely 4, And then according to int To explain the numbers .

however void, The compiler doesn't know what it's pointing to int、double、 Or a structure , So the compiler can't handle void Type pointer dereference .

7、 ... and 、 Fancy show

Many students think C It's just process oriented programming , Actually, with pointers , We can do the same in C To simulate the object 、 Inherit 、 Polymorphism and so on .

You can also use void Pointers implement generic programming , That is to say Java、C++ In the template .

If you are right C Implement object-oriented 、 Templates 、 Inherit the words of interest , You can be positive , give the thumbs-up , Leaving a message. ~

In addition, if you want to better improve your programming ability , Learn from good examples C Language C++ Programming ！ Overtaking in curve , One step at a time ！

C Language C++ Programming learning communication circle ,QQ Group 1030652847【 Click to enter 】 WeChat official account ：C Language programming learning base

Share （ Source code 、 Project practice video 、 Project notes , Introduction to Basics ）

Welcome to change careers and learn programming partners , Using more information to learn and grow faster than thinking about it yourself ！

Programming learning book sharing ：