C Character string and memory operation function in language

1 Characters and their operation functions

1.1 character

     Character type char yes C A very important type of language , Compared to integers , Floating point operations are slightly different , I'll introduce you today C Things about characters in language .

     The characters we are talking about here refer to the American Standard Code for information interchange (American Standard Code for Information Interchange, Hereinafter referred to as" ASCII code ) The characters in the table , According to the table , Each character corresponds to a number , Such as character 'a' Of ASCII The code number is 97, character 'A' Of ASCII The code number is 65, character '1' Of ASCII The code number is 49 wait . Because computers can only store binary code , The character is actually stored in memory ASCII The binary code of the code , So we can also think that :char Equivalent to 1 An unsigned integer of bytes .

b8fa57afb64bbee79618b21ade3dc6d4.png

chart 1.1 ASCII clock

     Because some characters or commands cannot be expressed directly ( For example, carriage return , Again C To define a character in a language, you need to use single quotation marks to enclose the character , And the single quotation mark itself is a character ), In this case, we need to use the escape character to express , It is written in the form of “ The backslash is followed by the specified character ”, In this case, the character after the backslash in the escape character will no longer represent its original meaning . for example 'n' The original meaning of English characters is English characters 'n', If you add a backslash :'\n', At this point, the compiler will put the backslash and n Put it together and compile , The corresponding meaning is line break .

     Common escape characters and their corresponding meanings ( Source: Baidu Encyclopedia < Escape character >):

Escape character

meaning

ASCII Code value

\a

Ring the bell (BEL)

007

\b

Backspace (BS), Move the current position to the previous column

008

\f

Change the page (FF), Move the current position to the beginning of the next page

012

\n

Line break (LF), Moves the current position to the beginning of the next line

010

\r

enter (CR), Move the current position to the beginning of the line

013

\t

Horizontal TAB (HT) ( Skip to the next TAB Location )

009

\v

Vertical tabulation (VT)

011

\\

Represents a backslash character ''\'

092

\’

Represents a single quotation mark ( apostrophe ) character

039

\”

Represents a double quote character

034

\?

Represents a question mark

063

\0

Null character (NULL)

000

\ddd

1 To 3 Any character represented by an octal number

Three octal

\xhh

Any character represented by hexadecimal

Hexadecimal


     For Chinese characters , The number of bytes of Chinese characters corresponding to different codes is different , Therefore, this paper does not discuss .

1.2 Character manipulation functions

    C There are two main types of character functions in language , One is character classification function , It is often used to judge whether the user's input is legal , The other is character conversion function , Used to convert the characters of English letters to uppercase or lowercase .

1.2.1 Character classification function

     Common character classification functions are shown in the table below :

function

This function returns true if its parameters meet the following conditions , Otherwise return false

iscntrl

Any control character

isspace

Blank character : Space ’ ’, Change the page ’\f, Line break ’\n’, enter ’\r’, tabs ’\t’, Vertical tabs ’\v’.

isdigit

Decimal number 0~9

isxdigit

Hexadecimal number , Including all the decimal numbers , Lowercase letters a~f, Capital A~F

islower

Lowercase letters a~z

isupper

Capital A~Z

isalpha

Letter a~z or A~Z

isalnum

Letters or Numbers ,0~9,a~z,A~Z

ispunct

Punctuation , Any character that is not a number or a letter ( Printable )

isgraph

Any graphic character

isprint

Any printable character , Including graphic characters and white space characters

     notes : stay ASCII code in , The first 031 No. and 127 Number ( common 33 individual ) It's a control character or communication specific character , Such as the controller :LF( Line break )、CR( enter )、FF( Change the page )、DEL( Delete )、BS( Backspace )BEL( Ring the bell ) etc. ; Communication special characters :SOH( Title )、EOT( Epilogue )、ACK( confirm ) etc. .

1.2.2 Character conversion function

  • Turn lowercase :

int tolower(int c);
  • Turn capitalization :

int toupper(int c);

     for example :

char s = 'a';
printf("%c\n",toupper(s));
printf("%c\n",s);

    image.png

chart 1.2   

     From the program running results can see that , When converting characters , The character itself has not changed , It's just that the character conversion function capitalizes the corresponding ( Or lowercase ) Corresponding ASCII Code value return . If you want to change the uppercase or lowercase of a string , Just traverse the string , Use the character conversion function .

2 String and its operation function

2.1 character string

     Strictly speaking ,C There is no string type in the language , So we use character arrays to simulate strings , Or use the constant string directly . Since it's an array of characters to simulate a string , And the characters use ASCII Code is stored in memory , When to stop ? How does the compiler know where the end of a string is .C Language standards have the following provisions : With '\0' As the end of a string .

     There are two ways to define strings :

     The way 1:

char str1[] = "Hello";

     although Hello Yes 5 Characters , But in fact, the system will automatically in the character 'o' And then add the character '\0'. As shown in the figure below .

image.png

chart 2.1

     The way 2:

char str2[6] = {'H','e','l','l','o','\0'};

     For this way of definition , You have to add... Manually at the end '\0', Otherwise, we define an array of characters, not strings .

2.2 String function

    C In language , String related functions are as follows :

Function name

meaning

strlen

Get string length

strcpy

String copy

strcat

String concatenation function

strcmp

String comparison function

strncpy

String specifies the number of characters copied

strncat

String specifies the number of characters to be spliced

strcnmp

String specifies the character comparison function

strstr

Determine whether a string is a fragment of another string

strtok

Splits a string by the specified separator

strerror

Error message reporting function

Next, we will introduce and simulate some of the functions one by one .

2.2.1strlen

    strlen: Find string length function .

size_t strlen( const char *string );

     It can be seen that , The return value of this function is an unsigned integer , So in practice, we can't directly deal with two strlen Subtract , Otherwise it will go wrong .

    strlen The function of is to find the length of a string , We have already mentioned , At the end of the string with '\0' As an end sign , So we just need to traverse from the beginning of the string , When it comes to '\0' Automatically stop when , Then return '\0' The number of characters before .

     So you can write the following code :

size_t my_strlen1( const char *str )
{
    assert(str);
    int count = 0;
    while (*str++ && ++count);
    return count;
}

perhaps

size_t my_strlen2( const char *str )
{
    assert(str);
    const char *start = str;
    while (*str++);
    return (str - 1 - start);// reduce 1 Because in the previous step, the pointer pointed to '\0' after , Although the condition is not satisfied, the loop is exited , but str And then we're going to do a step-by-step addition operation , So subtract 1.
}

Or not using temporary variables :

size_t my_strlen3( const char *str )
{
    assert(str);
    if (*str == '\0')
    {
        return 0;
    }
    return my_strlen3(str+1)+1;
}

2.2.2strcpy

    strcpy: String copy function :

char *strcpy( char *strDestination, const char *strSource );

Its meaning is to put the source string strSource Copy to destination string strDestination In the middle .

This function has the following precautions :

  • First, make sure that the destination string has enough space , Able to drop the source string , The destination string space should be at least as large as the source space , Besides , The target space should also be modifiable ( Don't be const modification ).

  • The source string must be '\0' end , Otherwise it will go wrong .

  • Of the source string '\0' Will be copied to the target space , As the end of a string .

  • return char* It is to realize chain access of functions .

     Simulation Implementation :

char* my_strcpy( char *dest, const char *src )
{
    assert(dest);
    assert(src);
    char* ret = dest;
    while(*dest++ = *src++);
    return ret;
}

2.2.3strcat

    strcat: String concatenation function

char *strcat( char *strDestination, const char *strSource );

Its meaning is to put the source string strSource Concatenate to the target string strDestination after .

     This function has the following precautions :

  • The target string has enough space to hold the source string .

  • The source string must be '\0' ending .

  • Append from the target string '\0' Starting at position , That is to say '\0' overwrite , So a string cannot append itself to itself .

     Simulation Implementation :

char* my_strcat( char *dest, const char *src )
{
    assert(dest);
    assert(src);
    char* ret = dest;
    while (*dest++);
    dest--;
    while (*dest++ = *src++);
    return ret;
}


2.2.4strcmp

    strcmp: String comparison function

int strcmp( const char *string1, const char *string2 );

     The string itself has no size , Here we compare the characters of two strings ASCII The size of the code value , That is, if string1 First character of ASCII The code value is greater than string2 First character of ASCII Code value , Return to a greater than 0 Number of numbers , If string1 First character of ASCII The code value is less than string2 First character of ASCII Code value , Just return to a less than 0 Number of numbers , If string1 First character of ASCII The code value is equal to string2 First character of ASCII Code value , And then compare their second character , And so on .

     Simulation Implementation :

int my_strcmp( const char *str1, const char *str2 )
{
    assert(str1);
    assert(str1);
    while (*str1 == *str2)
    {
        if (*str1 == '\0')
        {
            return 0;
        }
        str1++;
        str2++;
    }
    return *str1 - *str2;
}


2.2.5strcpy

    strncpy: String specifies the number of characters copied

char *strncpy( char *strDest, const char *strSource, size_t count );

Its meaning is to put the source string of count Copy characters to the target string space .

     When using this function, you should pay attention to the following :

  • If the source string length is less than count, After copying the source string , Add... After the target '\0', Until you add count individual .

  • count Should not exceed the target string space ( Because the string ends with '\0', therefore count Should be less than the space of the target string ).

     Simulation Implementation :

char* my_strncpy( char *dest, const char *src, size_t n)
{
    assert(dest);
    assert(src);
    char* ret = dest;
    while(n && (*dest++ = *src++))
    {
        n--;
    }
    if(n)
    {
        while (--n)
        {
            *dest++ ='\0';
        }
    }
    return ret;
}

2.2.6strncat

    strncat: String specifies the number of characters to be spliced

char *strncat( char *strDest, const char *strSource, size_t count );

         Its meaning is to put the source string of count Characters are appended to the target string .

     When using this function, you should pay attention to the following :

  • The target string must be '\0' ending .

  • When appending from the target string '\0' Start adding ,count At the end of appending strings , Fill in the back '\0'.

  • count Should not exceed the remaining space of the target string .

  • If the source string is not long enough count individual , Add... At the end '\0', Until it's full count Up to .

     Simulation Implementation :

char *my_strncat( char *dest, const char *src, size_t n )
{
    assert(dest);
    assert(src);
    int i;
    char* ret = dest;
    while(*dest)
    {
        dest++;
    }
    for(i=0;src[i] && i<n;i++)
    {
        dest[i] = src[i];
    }
    while(i <= n)
    {
        dest[i] = '\0';
        i++;
    }
    return ret;
}


2.2.7strncmp

    strncmp: Compare two strings before n Characters

int strncmp( const char *string1, const char *string2, size_t count );

     Compare two strings before count character , The principle is the same as strcmp.

2.2.8strstr

    strstr: Determine whether a string is a substring of another string .

char *strstr( const char *string, const char *strCharSet );

It means judgment strCharSet Is it string Substring of . The return value is a pointer , If it's not a substring , Returns a NULL The pointer , If it is , Then return to strCharSet stay string The first place in .

     Realization principle , from string The first character of is related to strCharSet Compare the first character of , If it's not equal , On the comparison string The second character of and strCharSet First character of , If equal , Compare string The third character of and strCharSet Second character of , If it's not equal , From string The third character of begins with strCharSet Compare the first character of , And so on .

     Simulation Implementation :

char *my_strstr( const char *str1, const char *str2)
{
    const char* s1 = str1;
    const char* s2 = str2;
    const char* cp = str1;
    if(*str2 == '\0')
    {
        return str1;
    }
    while(cp)
    {
        s1 = cp;
        s2 = str2;
        while(s1 && s2 && *s1 = *s2)
        {
            s1++;
            s2++;
        }
        if(*s2 = '\0')
        {
            return (char*)cp;
        }
        cp++;
    }
    return NULL;
}


2.2.9strtok

    strtok: Splits a string by the specified separator

char *strtok( char *strToken, const char *strDelimit );

It means according to strDelimit To split the characters in a string strToken.

     When using this function, you should pay attention to :

  • strToken Contains 0 One or more by strDelimit A token separated by one or more separators in a string .

  • strtok Function found strToken The next mark in , Change the mark to '\0', And return a pointer to the substring .

  • If strtok The first argument of the function is not NULL, Function will find str The first mark in ,strtok Function will hold its position in the string . And return the starting address of the separated string .

  • If strtok The first argument to the function is NULL, The function will start at the same position in the string that is saved , Find next tag .

  • If there are no more tags in the string ( That is, all the tags have been searched ), Then return to NULL The pointer .

     Use cases :

char str1[] = "123.456.55.88";
char str2[] = ".";
char* p = NULL;
char str3[50];
strcpy(str3,str1);
for (p = strtok(str3, str2); p != NULL; p = strtok(NULL, str2))
{
    printf("%s\n",p);
}


2.2.10strerror

    strerror: Return the error message corresponding to the error code

char* strerror(int errnum)

     When writing a program , There are always some situations that we don't think well of , In some parts of the program that might go wrong , We can set some error prompts in advance , So when the program is running , It can help us quickly locate the wrong place , Make the program easier to debug . Some errors and error codes are defined in advance in the system , These error codes are placed in global variables errno( A reference header file is required errno.h) in . We only need to call the above function when using it , If an error occurs , The error code and its corresponding information will be returned for us , When there are no mistakes ,errno The default value is 0.

     for example : Before opening a file , We need to determine if the file exists , If it doesn't exist, it can't be opened , At this point, you can call the function .


FILE* pFile;
pFile = fopen("1.txt","r");
if (pFile == NULL)
{
    printf("Error opening file 1.txt:%s\n", strerror(errno));
}

     Because in practice , There is no such document , So the output file does not exist , As shown in the figure .

image.png

chart 2.2

     There's another function perror, Integrated printing and strerror The function of , So in the above line of code printf The corresponding line can be rewritten as :

perror("Error opening file 1.txt")

Both output the same .

3 Memory (memory) Operation function

     In string operations , There are string comparisons , Copy , Splicing and so on , But it can only implement string operations , Often it's also subject to its Terminator '\0' The limitation of , When we want to copy comparisons or other types , These functions don't work , So we introduce the memory operation function here , It's similar to string manipulation functions , But it's not the same .

3.1memcpy

    memcpy: String copy function

void *memcpy( void *dest, const void *src, size_t count );

It means that the function will change from src The corresponding starting address starts to copy backward count Bytes of data to dest In the address pointed to . Copy end , return dest From .

     Because it is a direct copy of memory , So it can copy any type of data , Of course, it's not '\0' The limitation of , That is, it doesn't stop when it encounters the character , It's copying , Until the copy is full count Up to bytes .

     When dest stay src+count Within the range of , Then the copy result may not be correct ( For different platforms, this function is implemented in different ways , If you copy and paste at the same time , The overlapped area will be covered by new data , Copy results may not be what we expected , And if you copy it first and then paste it , Can be what we expect ).

     Simulation Implementation :

void *my_memcpy1( void *dest, const void *src, size_t num )
{
    assert(dest);
    assert(src);
    void* ret = dest;
    int i;
    for (i = 0; i < num; i++)
    {
        *((char*)dest+i) = *((char*)src+i);
    }
    return ret;
}

perhaps

void *my_memcpy2( void *dest, const void *src, size_t num )
{
    assert(dest);
    assert(src);
    void* ret = dest;
    while(num--)
    {
        *((char*)dest) = *((char*)src);
        dest = (char*)dest+1;
        src = (char*)src+1;
    }
    return ret;
}


3.2memmove

    memmove: Memory move

void *memmove( void *dest, const void *src, size_t count );

seeing the name of a thing one thinks of its function , Memory move , Is to put src Starting back count Bytes of memory copy moved to dest In the corresponding position . and memcpy The function is the same , Are all the src Starting back count Bytes of memory copy moved to dest In the corresponding position , But it also talked about , When dest stay src+count Range ( perhaps src stay dest+count Within the scope of ) Inside , That is, when the source space and the target space overlap ,memcpy There is no guarantee that the copy result is correct ,memmove Function is to solve this problem .

     analysis , When dest stay src+count When in range ( That is to say dest stay src To the right of ), As shown in the figure :

image.png

chart 3.1

D For overlapping areas , If you copy from front to back , That is, first A copy to D It's about , It will change the original D The data of the location is covered , Then put D Copy the data to G when , It's actually a copy of A The data of . If you copy from back to front , That is, first D Copy the data at to G, And then C Copy the data at to F, By analogy , At this point we can achieve the results we want .

And when src stay dest+count Within the scope of ( namely dest stay src Left side ), As shown in the figure , According to the above analysis, it should be copied from front to back , That is, first D Copy the data to A in , And then E Copy the data at to B in , By analogy .

image.png

chart 3.2

With the above analysis , Simulation Implementation ( In fact, it's mainly judgment dest Is in src Left or right ):

void *my_memmove( void *dest, const void *src, size_t num )
{
    assert(dest);
    assert(src);
    void* ret = dest;
    if (dest < src)
    {
        while{num--}// Copy from front to back 
        {
            *((char*)dest) = *((char*)src);
            dest = (char*)dest+1;
            src = (char*)src+1;
        }
    }
    else
    {
        while{num--}// Copy from back to front 
        {
            *((char*)dest+count) = *((char*)src+count);
        }
    }
    return ret;
}

3.3memcmp

    memcmp: Comparison function

int memcmp( const void *buf1, const void *buf2, size_t count );

It means comparing memory areas buf1 And buf2 Of count The size of bytes .

  • If buf1>buf2, Returns a positive number ;

  • If buf1=buf2, return 0;

  • If buf1<buf2, Return negative ;

     Simulation Implementation :

int my_memcmp( const void *buf1, const void *buf2, size_t num )
{
    assert(buf1);
    assert(buf2);
    while(*((*char)buf1) == *((*char)buf2)&& count--)
    {
        buf1 = (*char)buf1+1;
        buf2 = (*char)buf2+1;
    }
    if (count == 0)
    {
        return 0;
    }
    return *((char*)buf1) - *((char*)buf2)
}


3.4memset

    memset: initialization

void *memset( void *dest, int c, size_t count );

It means from dest Position start , The next count Set bytes to integers c, And finally back to dest The address of .

Because it's assigned by byte , So no matter c How big? , The system can only take c The last eight bits of the binary code are assigned to it , That's why , It's usually used 0( The binary code is complete 0) or -1( The binary code is complete 1) To initialize , Otherwise, it is easy to make mistakes .