当前位置：网站首页>String and underlying character types of go data type

String and underlying character types of go data type

2022-06-30 07:52:00 【weixin_ fifty-nine million two hundred and eighty-four thousand】

character string

Basic use

stay Go In language , String is a basic type , The default is through UTF-8 Encoded character sequence , When the character is ASCII Code time takes up 1 Bytes , Use other characters as needed 2-4 Bytes , For example, Chinese coding usually needs 3 Bytes .

Declaration and initialization

The declaration and initialization of strings are very simple , Examples are as follows ：

var str string         //  Declare string variables 
str = "Hello World"    //  Variable initialization 
str2 := " Hello , Academician "   //  It can also be declared and initialized at the same time

Format output

You can also use Go Language built in len() Function to get the length of the specified string , And by fmt Provided by the package Printf Format string output ：

fmt.Printf("The length of \"%s\" is %d \n", str, len(str)) 
fmt.Printf("The first character of \"%s\" is %c.\n", str, ch)

Escape character

Go Language strings do not support single quotes , String literals can only be defined in double quotes , If you want to escape a specific character , Can pass \ Realization , Just as we escaped double quotation marks and line breaks in the string above , Common characters that need to be escaped are as follows ：

\n ： A newline
\r ： A carriage return
\t ：tab key
\u or \U ：Unicode character
\\ ： The backslash itself

therefore , The output result of the above print code is ：

The length of "Hello world" is 11 
The first character of "Hello world" is H.

besides , You can include... In a string as follows "：

label := `Search results for "Golang":`

Multiline string

For multiline strings , It can also be done through ` structure ：

results := `Search results for "Golang":
- Go
- Golang
Golang Programming
`
fmt.Printf("%s", results)

The results are as follows ：

Search results for "Golang":
- Go
- Golang
- Golang Programming

Of course , Use + Connectors are also possible ：

results := "Search results for \"Golang\":\n" +
"- Go\n" +
"- Golang\n" +
"- Golang Programming\n"
fmt.Printf("%s", results)

The results are the same , But you have to input many more characters , It is not as elegant as the previous one .

Immutable value type

Although the characters in the string can be accessed through array subscript ：

ch := str[0] //  Take the first character of the string

But unlike arrays , stay Go In language , A string is an immutable value type , Once initialized , Its contents cannot be modified , Take the following example for example ：

str := "Hello world"
str[0] = 'X' //  Compile error

The compiler will report an error similar to the following ：

cannot assign to str[0]

Character encoding

Go The default string in the language is UTF-8 Coded Unicode Character sequence , So it can include non ANSI character , such as 「Hello, Academician 」 Can appear in Go In the code .

But it should be noted that , If your Go Code needs to contain non ANSI character , Please note that the encoding format must be selected when saving the source file UTF-8. Especially in Windows General Editors in the lower level are saved as local codes by default , For example, China may be GBK Code instead of UTF-8, If you don't notice this, there will be some unexpected situations when compiling and running .

The encoding and conversion of strings is to process text documents （ such as TXT、XML、HTML etc. ） Is a very common requirement , however Go By default, the language only supports UTF-8 and Unicode code , For other codes ,Go The language standard library does not have built-in transcoding support .

String manipulation

String connection

Go Built in provides rich string functions , Common operations include connecting 、 Get the length and the specified characters , Getting the length and specifying the characters has been described earlier , The string connection only needs to be through + The connector is OK ：

str = str + ",  Application development "
str += ",  Application development "  //  The above statement can also be abbreviated as , The effect is exactly the same

in addition , Another thing to note is that if the string length is long , Need a new line , be + The connector must appear at the end of the previous line , Otherwise, an error will be reported ：

str = str +
        ",  Application development "

String slice

stay Go In language , The function of obtaining substrings can be realized through string slicing ：

str := "hello, world"
str1 := str[:5]  //  Get index 5（ Not included ） Previous substring 
str2 := str[7:]  //  Get index 7（ contain ） After the string 
str3 := str[0:5]  //  Get from index 0（ contain ） To the index 5（ Not included ） Between the strings 
fmt.Println("str1:", str1)
fmt.Println("str2:", str2)
fmt.Println("str3:", str3)

Go Slice interval can be understood by comparing the concept of interval in mathematics , It's a Left closed right away The range of , For example str[0:5] The interval corresponding to the string element is [0,5),str[:5] The corresponding interval is [0,5)（ Array index from 0 Start ）,str[7:] The corresponding interval is [7:len(str)]（ This is a closed interval , The exception is , Because the end of the interval is not specified ）.

therefore , The above code is printed as follows ：

str1: hello
str2: world
str3: hello

in summary , String slicing through : The start and end point indexes of the connection slice the string , The number before the colon represents the starting point , Null means from 0 Start , The next number represents the end point , Null means to the end of the string , Not the length of the substring . therefore str[:] Will print out the complete string .

Besides Go String also supports string comparison 、 Contains the specified character / Substring 、 Gets the specified substring index position 、 String substitution 、 toggle case 、trim Wait for the operation

String traversal

Go The language supports two ways to traverse strings .

A is Byte array The way to traverse ：

str := "Hello,  The world " 
n := len(str) 
for i := 0; i < n; i++ {
    ch := str[i]    //  Take the characters in the string according to the subscript ,ch  The type is  byte
    fmt.Println(i, ch) 
}

The output of this example is ：

It can be seen that , The length of this string is 13, Although intuitively , This string should only have 9 Characters . This is because each Chinese character is in UTF-8 Middle occupancy 3 Bytes , instead of 1 Bytes .

The other is to Unicode character Traverse ：

str := "Hello,  The world " 
for i, ch := range str { 
    fmt.Println(i, ch)    // ch  The type of  rune 
}

The output is ：

This is the time , What's printed is 9 A character. , Because in order to Unicode When traversing in character mode , The type of each character is rune, instead of byte.

You may be a little confused when you see here , Will be curious Go How does the bottom layer store strings , Why do different traversal methods get different results ？ Now let's give you a simple break .

Underlying character types

Go Language provides separate type support for individual characters in strings , stay Go Two character types are supported in the language ：

One is byte, representative UTF-8 The value of a single byte in the encoding （ So is it uint8 Alias for type , The two are equivalent , Because it just occupies 1 Bytes of memory space ）;
The other is rune, Represents a single Unicode character （ So is it int32 Alias for type , Because it just occupies 4 Bytes of memory space . About rune Related operations , Can refer to Go Standard library unicode package ）.

UTF-8 and Unicode The difference between

Speaking of this , We need to distinguish between UTF-8 and Unicode The difference between .

Unicode It's a character set , It includes all characters of all languages in the world , Similar terms include ASCII Character set （ Contains only 256 Characters ）、ISO 8859-1 Character set, etc. （ Contains all western Latin letters ）, The generalized Unicode Both contain the character set , It also contains coding rules , such as UTF-8、UTF-16、UTF8MB4、GBK etc. .

therefore UTF-8 yes Unicode One of the implementation methods of character set , It will be Unicode Characters are encoded in some way . In the specific implementation ,UTF-8 Is a variable length coding rule , from 1~4 Different bytes , For example, the English characters are 1 Bytes , The Chinese characters are 3 Bytes . adopt UTF-8 Coded Unicode Characters with maximum length 4 Bytes as a fixed memory space occupied by a single character , stay Go In language, you can use unicode/utf8 Package progress UTF-8 and Unicode Conversion between .

So if you go from Unicode From the perspective of character set , Each character of a string is an independent unit of a character , But if from UTF-8 From a coding perspective , A character may be encoded by more than one byte .

We go through len The function gets the byte length of the string , According to this, when traversing a string through a character array , In order to UTF-8 From the perspective of coding ; And when we pass range Keyword when traversing a string , Again from Unicode From the angle of character set , So we get different results .

For the sake of simplifying the language ,Go Most languages API All assume that the string is UTF-8 code .

take Unicode Encoding into printable characters

If you want to Unicode The character encoding is converted to the corresponding character , have access to string Function to convert ：

str := "Hello,  The world " 
for i, ch := range str { 
    fmt.Println(i, string(ch))
}

The corresponding print results are as follows ：

0 H
1 e
2 l
3 l
4 o
5 ,
6  
7  the 
10  world

UTF-8 Coding cannot be transformed like this , English characters are OK , Because an English character is a byte , Chinese characters are garbled , Because a Chinese character encoding needs three bytes , Converting a single byte will cause garbled code .

原网站

版权声明
本文为[weixin_ fifty-nine million two hundred and eighty-four thousand]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206300743154684.html