当前位置:网站首页>String and underlying character types of go data type
String and underlying character types of go data type
2022-06-30 07:52:00 【weixin_ fifty-nine million two hundred and eighty-four thousand】
character string
Basic use
stay Go In language , String is a basic type , The default is through UTF-8 Encoded character sequence , When the character is ASCII Code time takes up 1 Bytes , Use other characters as needed 2-4 Bytes , For example, Chinese coding usually needs 3 Bytes .
Declaration and initialization
The declaration and initialization of strings are very simple , Examples are as follows :
var str string // Declare string variables
str = "Hello World" // Variable initialization
str2 := " Hello , Academician " // It can also be declared and initialized at the same time Format output
You can also use Go Language built in len() Function to get the length of the specified string , And by fmt Provided by the package Printf Format string output :
fmt.Printf("The length of \"%s\" is %d \n", str, len(str))
fmt.Printf("The first character of \"%s\" is %c.\n", str, ch)Escape character
Go Language strings do not support single quotes , String literals can only be defined in double quotes , If you want to escape a specific character , Can pass \ Realization , Just as we escaped double quotation marks and line breaks in the string above , Common characters that need to be escaped are as follows :
\n: A newline\r: A carriage return\t:tab key\uor \U :Unicode character\\: The backslash itself
therefore , The output result of the above print code is :
The length of "Hello world" is 11
The first character of "Hello world" is H. besides , You can include... In a string as follows ":
label := `Search results for "Golang":`Multiline string
For multiline strings , It can also be done through ` structure :
results := `Search results for "Golang":
- Go
- Golang
Golang Programming
`
fmt.Printf("%s", results)The results are as follows :
Search results for "Golang":
- Go
- Golang
- Golang Programming Of course , Use + Connectors are also possible :
results := "Search results for \"Golang\":\n" +
"- Go\n" +
"- Golang\n" +
"- Golang Programming\n"
fmt.Printf("%s", results)The results are the same , But you have to input many more characters , It is not as elegant as the previous one .
Immutable value type
Although the characters in the string can be accessed through array subscript :
ch := str[0] // Take the first character of the string But unlike arrays , stay Go In language , A string is an immutable value type , Once initialized , Its contents cannot be modified , Take the following example for example :
str := "Hello world"
str[0] = 'X' // Compile error The compiler will report an error similar to the following :
cannot assign to str[0]Character encoding
Go The default string in the language is UTF-8 Coded Unicode Character sequence , So it can include non ANSI character , such as 「Hello, Academician 」 Can appear in Go In the code .
But it should be noted that , If your Go Code needs to contain non ANSI character , Please note that the encoding format must be selected when saving the source file UTF-8. Especially in Windows General Editors in the lower level are saved as local codes by default , For example, China may be GBK Code instead of UTF-8, If you don't notice this, there will be some unexpected situations when compiling and running .
The encoding and conversion of strings is to process text documents ( such as TXT、XML、HTML etc. ) Is a very common requirement , however Go By default, the language only supports UTF-8 and Unicode code , For other codes ,Go The language standard library does not have built-in transcoding support .
String manipulation
String connection
Go Built in provides rich string functions , Common operations include connecting 、 Get the length and the specified characters , Getting the length and specifying the characters has been described earlier , The string connection only needs to be through + The connector is OK :
str = str + ", Application development "
str += ", Application development " // The above statement can also be abbreviated as , The effect is exactly the same in addition , Another thing to note is that if the string length is long , Need a new line , be + The connector must appear at the end of the previous line , Otherwise, an error will be reported :
str = str +
", Application development "String slice
stay Go In language , The function of obtaining substrings can be realized through string slicing :
str := "hello, world"
str1 := str[:5] // Get index 5( Not included ) Previous substring
str2 := str[7:] // Get index 7( contain ) After the string
str3 := str[0:5] // Get from index 0( contain ) To the index 5( Not included ) Between the strings
fmt.Println("str1:", str1)
fmt.Println("str2:", str2)
fmt.Println("str3:", str3)Go Slice interval can be understood by comparing the concept of interval in mathematics , It's a Left closed right away The range of , For example str[0:5] The interval corresponding to the string element is [0,5),str[:5] The corresponding interval is [0,5)( Array index from 0 Start ),str[7:] The corresponding interval is [7:len(str)]( This is a closed interval , The exception is , Because the end of the interval is not specified ).
therefore , The above code is printed as follows :
str1: hello
str2: world
str3: hello in summary , String slicing through : The start and end point indexes of the connection slice the string , The number before the colon represents the starting point , Null means from 0 Start , The next number represents the end point , Null means to the end of the string , Not the length of the substring . therefore str[:] Will print out the complete string .
Besides Go String also supports string comparison 、 Contains the specified character / Substring 、 Gets the specified substring index position 、 String substitution 、 toggle case 、trim Wait for the operation
String traversal
Go The language supports two ways to traverse strings .
A is Byte array The way to traverse :
str := "Hello, The world "
n := len(str)
for i := 0; i < n; i++ {
ch := str[i] // Take the characters in the string according to the subscript ,ch The type is byte
fmt.Println(i, ch)
}The output of this example is :
0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 228
8 184
9 150
10 231
11 149
12 140It can be seen that , The length of this string is 13, Although intuitively , This string should only have 9 Characters . This is because each Chinese character is in UTF-8 Middle occupancy 3 Bytes , instead of 1 Bytes .
The other is to Unicode character Traverse :
str := "Hello, The world "
for i, ch := range str {
fmt.Println(i, ch) // ch The type of rune
}The output is :
0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 19990
10 30028 This is the time , What's printed is 9 A character. , Because in order to Unicode When traversing in character mode , The type of each character is rune, instead of byte.
You may be a little confused when you see here , Will be curious Go How does the bottom layer store strings , Why do different traversal methods get different results ? Now let's give you a simple break .
Underlying character types
Go Language provides separate type support for individual characters in strings , stay Go Two character types are supported in the language :
- One is
byte, representative UTF-8 The value of a single byte in the encoding ( So is ituint8Alias for type , The two are equivalent , Because it just occupies 1 Bytes of memory space ); - The other is
rune, Represents a single Unicode character ( So is itint32Alias for type , Because it just occupies 4 Bytes of memory space . AboutruneRelated operations , Can refer to Go Standard library unicode package ).
UTF-8 and Unicode The difference between
Speaking of this , We need to distinguish between UTF-8 and Unicode The difference between .
Unicode It's a character set , It includes all characters of all languages in the world , Similar terms include ASCII Character set ( Contains only 256 Characters )、ISO 8859-1 Character set, etc. ( Contains all western Latin letters ), The generalized Unicode Both contain the character set , It also contains coding rules , such as UTF-8、UTF-16、UTF8MB4、GBK etc. .
therefore UTF-8 yes Unicode One of the implementation methods of character set , It will be Unicode Characters are encoded in some way . In the specific implementation ,UTF-8 Is a variable length coding rule , from 1~4 Different bytes , For example, the English characters are 1 Bytes , The Chinese characters are 3 Bytes . adopt UTF-8 Coded Unicode Characters with maximum length 4 Bytes as a fixed memory space occupied by a single character , stay Go In language, you can use unicode/utf8 Package progress UTF-8 and Unicode Conversion between .
So if you go from Unicode From the perspective of character set , Each character of a string is an independent unit of a character , But if from UTF-8 From a coding perspective , A character may be encoded by more than one byte .
We go through len The function gets the byte length of the string , According to this, when traversing a string through a character array , In order to UTF-8 From the perspective of coding ; And when we pass range Keyword when traversing a string , Again from Unicode From the angle of character set , So we get different results .
For the sake of simplifying the language ,Go Most languages API All assume that the string is UTF-8 code .
take Unicode Encoding into printable characters
If you want to Unicode The character encoding is converted to the corresponding character , have access to string Function to convert :
str := "Hello, The world "
for i, ch := range str {
fmt.Println(i, string(ch))
}The corresponding print results are as follows :
0 H
1 e
2 l
3 l
4 o
5 ,
6
7 the
10 world UTF-8 Coding cannot be transformed like this , English characters are OK , Because an English character is a byte , Chinese characters are garbled , Because a Chinese character encoding needs three bytes , Converting a single byte will cause garbled code .
边栏推荐
- Given a fixed point and a straight line, find the normal equation of the straight line passing through the point
- Deep learning -- language model and sequence generation
- December 4, 2021 - Introduction to macro genome analysis process tools
- Armv8 (coretex-a53) debugging based on openocd and ft2232h
- 25岁,从天坑行业提桶跑路,在经历千辛万苦转行程序员,属于我的春天终于来了
- November 22, 2021 [reading notes] - bioinformatics and functional genomics (Section 5 of Chapter 5 uses a comparison tool similar to blast to quickly search genomic DNA)
- 深度学习——卷积的滑动窗口实现
- January 23, 2022 [reading notes] - bioinformatics and functional genomics (Chapter 6: multiple sequence alignment)
- 深度学习——序列模型and数学符号
- 架构实战营模块 5 作业
猜你喜欢
随机推荐
Final review -php learning notes 1
安科瑞高等学校校园建筑节能监管系统建设
Xiashuo think tank: 28 updates of the planet reported today (including the information of flirting with girls and Han Tuo on Valentine's day)
Examen final - notes d'apprentissage PHP 3 - Déclaration de contrôle du processus PHP
December 4, 2021 - Introduction to macro genome analysis process tools
期末复习-PHP学习笔记1
Deep learning - LSTM
HelloWorld
STM32 infrared communication
November 19, 2021 [reading notes] a summary of common problems of sneakemake (Part 2)
Intersection of two lines
Examen final - notes d'apprentissage PHP 5 - Tableau PHP
2021-10-27 [WGS] pacbio third generation methylation modification process
January 23, 2022 [reading notes] - bioinformatics and functional genomics (Chapter 6: multiple sequence alignment)
期末複習-PHP學習筆記3-PHP流程控制語句
December 13, 2021 [reading notes] | understanding of chain specific database building
期末复习-PHP学习笔记4-PHP自定义函数
期末复习-PHP学习笔记5-PHP数组
Line fitting (least square method)
鲸探NFT数字臧品系统开发技术分享








