当前位置:网站首页>String and underlying character types of go data type
String and underlying character types of go data type
2022-06-30 07:52:00 【weixin_ fifty-nine million two hundred and eighty-four thousand】
character string
Basic use
stay Go In language , String is a basic type , The default is through UTF-8 Encoded character sequence , When the character is ASCII Code time takes up 1 Bytes , Use other characters as needed 2-4 Bytes , For example, Chinese coding usually needs 3 Bytes .
Declaration and initialization
The declaration and initialization of strings are very simple , Examples are as follows :
var str string // Declare string variables
str = "Hello World" // Variable initialization
str2 := " Hello , Academician " // It can also be declared and initialized at the same time Format output
You can also use Go Language built in len() Function to get the length of the specified string , And by fmt Provided by the package Printf Format string output :
fmt.Printf("The length of \"%s\" is %d \n", str, len(str))
fmt.Printf("The first character of \"%s\" is %c.\n", str, ch)Escape character
Go Language strings do not support single quotes , String literals can only be defined in double quotes , If you want to escape a specific character , Can pass \ Realization , Just as we escaped double quotation marks and line breaks in the string above , Common characters that need to be escaped are as follows :
\n: A newline\r: A carriage return\t:tab key\uor \U :Unicode character\\: The backslash itself
therefore , The output result of the above print code is :
The length of "Hello world" is 11
The first character of "Hello world" is H. besides , You can include... In a string as follows ":
label := `Search results for "Golang":`Multiline string
For multiline strings , It can also be done through ` structure :
results := `Search results for "Golang":
- Go
- Golang
Golang Programming
`
fmt.Printf("%s", results)The results are as follows :
Search results for "Golang":
- Go
- Golang
- Golang Programming Of course , Use + Connectors are also possible :
results := "Search results for \"Golang\":\n" +
"- Go\n" +
"- Golang\n" +
"- Golang Programming\n"
fmt.Printf("%s", results)The results are the same , But you have to input many more characters , It is not as elegant as the previous one .
Immutable value type
Although the characters in the string can be accessed through array subscript :
ch := str[0] // Take the first character of the string But unlike arrays , stay Go In language , A string is an immutable value type , Once initialized , Its contents cannot be modified , Take the following example for example :
str := "Hello world"
str[0] = 'X' // Compile error The compiler will report an error similar to the following :
cannot assign to str[0]Character encoding
Go The default string in the language is UTF-8 Coded Unicode Character sequence , So it can include non ANSI character , such as 「Hello, Academician 」 Can appear in Go In the code .
But it should be noted that , If your Go Code needs to contain non ANSI character , Please note that the encoding format must be selected when saving the source file UTF-8. Especially in Windows General Editors in the lower level are saved as local codes by default , For example, China may be GBK Code instead of UTF-8, If you don't notice this, there will be some unexpected situations when compiling and running .
The encoding and conversion of strings is to process text documents ( such as TXT、XML、HTML etc. ) Is a very common requirement , however Go By default, the language only supports UTF-8 and Unicode code , For other codes ,Go The language standard library does not have built-in transcoding support .
String manipulation
String connection
Go Built in provides rich string functions , Common operations include connecting 、 Get the length and the specified characters , Getting the length and specifying the characters has been described earlier , The string connection only needs to be through + The connector is OK :
str = str + ", Application development "
str += ", Application development " // The above statement can also be abbreviated as , The effect is exactly the same in addition , Another thing to note is that if the string length is long , Need a new line , be + The connector must appear at the end of the previous line , Otherwise, an error will be reported :
str = str +
", Application development "String slice
stay Go In language , The function of obtaining substrings can be realized through string slicing :
str := "hello, world"
str1 := str[:5] // Get index 5( Not included ) Previous substring
str2 := str[7:] // Get index 7( contain ) After the string
str3 := str[0:5] // Get from index 0( contain ) To the index 5( Not included ) Between the strings
fmt.Println("str1:", str1)
fmt.Println("str2:", str2)
fmt.Println("str3:", str3)Go Slice interval can be understood by comparing the concept of interval in mathematics , It's a Left closed right away The range of , For example str[0:5] The interval corresponding to the string element is [0,5),str[:5] The corresponding interval is [0,5)( Array index from 0 Start ),str[7:] The corresponding interval is [7:len(str)]( This is a closed interval , The exception is , Because the end of the interval is not specified ).
therefore , The above code is printed as follows :
str1: hello
str2: world
str3: hello in summary , String slicing through : The start and end point indexes of the connection slice the string , The number before the colon represents the starting point , Null means from 0 Start , The next number represents the end point , Null means to the end of the string , Not the length of the substring . therefore str[:] Will print out the complete string .
Besides Go String also supports string comparison 、 Contains the specified character / Substring 、 Gets the specified substring index position 、 String substitution 、 toggle case 、trim Wait for the operation
String traversal
Go The language supports two ways to traverse strings .
A is Byte array The way to traverse :
str := "Hello, The world "
n := len(str)
for i := 0; i < n; i++ {
ch := str[i] // Take the characters in the string according to the subscript ,ch The type is byte
fmt.Println(i, ch)
}The output of this example is :
0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 228
8 184
9 150
10 231
11 149
12 140It can be seen that , The length of this string is 13, Although intuitively , This string should only have 9 Characters . This is because each Chinese character is in UTF-8 Middle occupancy 3 Bytes , instead of 1 Bytes .
The other is to Unicode character Traverse :
str := "Hello, The world "
for i, ch := range str {
fmt.Println(i, ch) // ch The type of rune
}The output is :
0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 19990
10 30028 This is the time , What's printed is 9 A character. , Because in order to Unicode When traversing in character mode , The type of each character is rune, instead of byte.
You may be a little confused when you see here , Will be curious Go How does the bottom layer store strings , Why do different traversal methods get different results ? Now let's give you a simple break .
Underlying character types
Go Language provides separate type support for individual characters in strings , stay Go Two character types are supported in the language :
- One is
byte, representative UTF-8 The value of a single byte in the encoding ( So is ituint8Alias for type , The two are equivalent , Because it just occupies 1 Bytes of memory space ); - The other is
rune, Represents a single Unicode character ( So is itint32Alias for type , Because it just occupies 4 Bytes of memory space . AboutruneRelated operations , Can refer to Go Standard library unicode package ).
UTF-8 and Unicode The difference between
Speaking of this , We need to distinguish between UTF-8 and Unicode The difference between .
Unicode It's a character set , It includes all characters of all languages in the world , Similar terms include ASCII Character set ( Contains only 256 Characters )、ISO 8859-1 Character set, etc. ( Contains all western Latin letters ), The generalized Unicode Both contain the character set , It also contains coding rules , such as UTF-8、UTF-16、UTF8MB4、GBK etc. .
therefore UTF-8 yes Unicode One of the implementation methods of character set , It will be Unicode Characters are encoded in some way . In the specific implementation ,UTF-8 Is a variable length coding rule , from 1~4 Different bytes , For example, the English characters are 1 Bytes , The Chinese characters are 3 Bytes . adopt UTF-8 Coded Unicode Characters with maximum length 4 Bytes as a fixed memory space occupied by a single character , stay Go In language, you can use unicode/utf8 Package progress UTF-8 and Unicode Conversion between .
So if you go from Unicode From the perspective of character set , Each character of a string is an independent unit of a character , But if from UTF-8 From a coding perspective , A character may be encoded by more than one byte .
We go through len The function gets the byte length of the string , According to this, when traversing a string through a character array , In order to UTF-8 From the perspective of coding ; And when we pass range Keyword when traversing a string , Again from Unicode From the angle of character set , So we get different results .
For the sake of simplifying the language ,Go Most languages API All assume that the string is UTF-8 code .
take Unicode Encoding into printable characters
If you want to Unicode The character encoding is converted to the corresponding character , have access to string Function to convert :
str := "Hello, The world "
for i, ch := range str {
fmt.Println(i, string(ch))
}The corresponding print results are as follows :
0 H
1 e
2 l
3 l
4 o
5 ,
6
7 the
10 world UTF-8 Coding cannot be transformed like this , English characters are OK , Because an English character is a byte , Chinese characters are garbled , Because a Chinese character encoding needs three bytes , Converting a single byte will cause garbled code .
边栏推荐
- Pre ++ and post ++ overloads
- Self study notes -- use of 74h573
- Common sorting methods
- Construction of module 5 of actual combat Battalion
- National technology n32g45x series about timer timing cycle calculation
- Deep learning -- sequence model and mathematical symbols
- Personal blog one article multi post tutorial - basic usage of openwriter management tool
- 期末复习-PHP学习笔记4-PHP自定义函数
- Deep learning - LSTM
- 回文子串、回文子序列
猜你喜欢

December 4, 2021 - Introduction to macro genome analysis process tools

Final review -php learning notes 5-php array

Use of nested loops and output instances

CRM能为企业带来哪些管理提升

Simple application of generating function -- integer splitting 2

24C02

Combinatorial mathematics Chapter 2 Notes
![November 22, 2021 [reading notes] - bioinformatics and functional genomics (Section 5 of Chapter 5 uses a comparison tool similar to blast to quickly search genomic DNA)](/img/de/7ffcc8d6911c499a9798ac9215c63f.jpg)
November 22, 2021 [reading notes] - bioinformatics and functional genomics (Section 5 of Chapter 5 uses a comparison tool similar to blast to quickly search genomic DNA)

Global digital industry strategy and policy observation in 2021 (China Academy of ICT)

Permutation and combination of probability
随机推荐
Final review -php learning notes 1
2022 Research Report on China's intelligent fiscal and tax Market: accurate positioning, integration and diversity
Processes, jobs, and services
Deep learning -- feature point detection and target detection
January 23, 2022 [reading notes] - bioinformatics and functional genomics (Chapter 6: multiple sequence alignment)
Self study notes -- use of 74h573
At the end of June, you can start to make preparations, otherwise you won't have a share in such a profitable industry
Deep learning - residual networks resnets
C. Fishingprince Plays With Array
Basic knowledge points
你了解IP协议吗?
Final review -php learning notes 3-php process control statement
right four steps of SEIF SLAM
Projection point of point on line
Introduction notes to pytorch deep learning (XII) neural network - nonlinear activation
Final review -php learning notes 4-php custom functions
Efga design open source framework fabulous series (I) establishment of development environment
November 21, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)
Deep learning -- language model and sequence generation
6月底了,可以开始做准备了,不然这么赚钱的行业就没你的份了