当前位置:网站首页>[rust notes] 15 string and text (Part 1)
[rust notes] 15 string and text (Part 1)
2022-07-05 06:05:00 【phial03】
15 - Strings and text
15.1-Unicode
15.1.1-ASCII、Latin-1 And Unicode
Unicode And ASCII All of the ASCII The code points are the same (0 ~ 0x7f).
Unicode take (0 ~ 0x7f) The range of code points is called Latin-1 Code block (
ISO/IEC 8859-1).namely Unicode yes Latin-1 Superset :
Latin-1 Convert to Unicode:
fn latin1_to_char(latin1: u8) -> char { latin1 as char }Unicode Convert to Latin-1:
fn char_to_latin1(c: char) -> Option<u8> { if c as u32 <= 0xff { Some(c as u8) } else { None } }
15.1.2-UTF-8
- Rust Of
StringandstrType used UTF-8 The encoding format represents the text .UTF-8 Encode characters as 1 To 4 A sequence of bytes . - UTF-8 Format restrictions on sequences :
- For a given code point , Only the shortest encoding is considered well formed , I can't use 4 To encode bytes, just 3 A code point of bytes .
- Well formed UTF-8 Not right 0xd800 ~ 0xdfff, And greater than 0x10ffff Numerical code of .
- UTF-8 Important attributes of :
- UTF-8 Code matching point 0 To 0x7f The encoding of is bytes 0 To 0x7f, preservation ASCII Bytes of text are the most efficient UTF-8.ASCII And UTF-8 It's reversible , and Latin-1 And UTF-8 Not reversible .
- By observing the first few bits of any byte , You can know that it is some characters UTF-8 The first byte of the encoding , Or the middle byte .
- By encoding the first few bits of the first byte, you can know the total length of the encoding .
- The maximum encoding length is 4 Bytes ,UTF-8 There is no need for infinite loops , It can be used to process untrusted data .
- Well formed UTF-8 Kind of , You can quickly point out the start and end positions of character encoding .UTF-8 The first byte of is obviously different from the following bytes .
15.1.5 - Text directionality
- Some words are written from left to right : It belongs to the normal way of writing or reading , It's also Unicode The order in which characters are stored .
- Some words are written from right to left : The first byte of a string stores the encoding of the character to be written on the far right .
15.2 - character (char)
charThe type is save Unicode Code point 32 A value .- The scope is :0 To 0xd7ff, perhaps 0xe000 To 0x10ffff.
charType implementsCopyandClone, And comparison 、 hash 、 All common features of the format .
15.2.1 - Character classification —— Methods of detecting character categories
ch.is_numeric(): Numeric character , Include Unicode General categoryNumber; digitandNumber; letter, But does not includeNumber; other.ch.is_alphabetic(): Alphabetic character , Include Unicode Of “Alphabetic” Derived properties .ch.is_alphanumeric(): Numeric or alphabetic characters , Including the above two categories .ch.is_whitespace(): Blank character , Include Unicode Character properties “WSpace=Y”.ch.is_control: Control characters , Include Unicode OfOther, controlGeneral category .
15.2.2 - Deal with numbers
ch.to_digit(radix): decisionchWhether the cardinality isradixOf ASCII Numbers . If so, go backSome(num), amongnumyesu32. otherwise , returnNone.radixThe range is 2~36. Ifradix > 10, that ASCII The letter will be taken as the value 10~35 The number of .std::char::from_digit(num, radix): holdu32The numbernumConvert tochar. Ifradix > 10,chIt's lowercase .ch.is_digit(radix): staychIs based onradixUnder the ASCII Digital hour , returntrue. Equivalent toch.to_digit(radix) != None.
15.2.3 - Character case conversion
ch.is_lowercase(): JudgechLowercase or not .ch.is_uppercase(): JudgechIs it a capital letter .ch.to_lowercase(): takechConvert to lowercase .ch.to_uppercase(): takechConvert to uppercase .
15.2.4 - Convert with integer
asOperators can putcharConvert to any integer type , The high bit will be shielded .asOperators can put anyu8Value tochar.charTypes also implementFrom<u8>. Recommendedstd::char::from_u32, returnOption<char>.
15.3-String And str
Rust Of
StringandstrType only saves well formed UTF-8.StringType can create scalable buffers , To hold strings . Essential forVec<u8>Type of packaging .strType is to manipulate string text in place .StringThe dereference of is&str.strAll methods defined on , Can be inStringCall directly on .The text processing method indexes the text according to the byte offset , Length is also measured in bytes , Not by character .
Rust According to the name of the variable , Guess its type , Such as :
Variable name Guess the type stringStringslice&strOr dereference as&strThe type of , Such asStringorRc<String>chcharnusize, lengthi, jusize, Byte offsetrangeusizeByte offset range , It may be fully qualifiedi..j, Partially Limitedi..or..j, Or infinite..patternAny mode type : char, String, &str, &[char], FnMut(char) -> bool
15.3.1 - establish String value
String::new(): Return a new empty string . There is no buffer allocated on the heap , Subsequently, it will be allocated as needed .String::with_capacity(n): Return a new empty string , At the same time, allocate at least... On the heapnByte buffer .slice.to_string(): It is often used to create by string literalsString. Assign a brand newString, The content issliceCopy of .iter.collect(): By splicing all items of the iterator (char、&strorStringvalue ) To buildString. The following is an example of deleting spaces in a string :let spacey = "man hat tan"; let spaceless: String = spacey.chars().filter(|c| !c.is_whitespcae()).collect(); assert_eq!(spaceless, "manhattan");slice.to_owned(): takesliceCopy of as a new distributionStringreturn .&strType cannot implementClone, This method can achieve the effect of cloning .
15.3.2 - Simple check —— Get basic information from string slices
slice.len(): Returns in bytessliceThe length of .slice.is_empyt(): stayslice.len() == 0When to return totrue.slice[range]: Return to borrowingsliceSlice the specified part of .Can not be like
slice[i]In this format, get a string slice of location index . Instead, you need to generate a based on slices chars iterator , Let the iterator parse the corresponding string UTF-8:let par = "rust he"; assert_eq!(par[6..].chars().next(), Some('e'));slice.split_at(i): Return fromsliceBorrowed tuples of two shared slices ,slice[..i]andslice[i..].slice.is_char_boundary(i): stayiReturnstrue.Slices can be compared equally 、 Order and hash .
15.3.3 - towards String Append and insert text
string.push(ch): Alphabet characterchAppend to the end of the string .string.push_str(slice): AdditionalsliceThe whole content of .string.extend(iter): Put the iteratoriterAll items generated are appended to the string . Iterators can generatechar、strorStringvalue .string.insert(i, ch): In byte offset valueiThe location of , Insert the character... Into the stringch.iAll subsequent characters are moved back one bit .string.insert_str(i, slice): In byte offset valueiThe location of , Insert... Into the stringsliceThe whole content of .StringRealizedstd::fmt::Write, So you can usewrite!andwriteln!macro , toStringAppend formatted text . Their return value type isResult. Need to add at the end?Operator to handle errors .use std::fmt::Write; let mut letter = String::new(); writeln!(letter, "Whose {} these are I think I know", "rustabagas")?;+The operator : When the operand is a string , It can be used for string splicing .
15.3.4 - Delete text
string.shrink_to_fit(): After deleting the string contents , Can be used to free memory .string.clear(): Reset the string to empty characters .string.truncate(n): Discard byte offset valuesnAll the characters after .string.pop(): Remove the last character from the string , AndOption<char>As return value .string.remove(i): Delete byte offset value from stringiWhere the character is , And return the character , The following characters will move forward .string.drain(range): According to the return of Godin byte index , Return iterator , And delete the corresponding characters when the iterator is cleared .
15.3.5 - The Convention of search and iteration
Rust Standard library functions related to searching and iterating text , Follow the following naming convention :
- Most operations can process text from left to right ;
- The name to
rThe first operation is handled from right to left , Such asrsplitandsplitThe opposite operation of . - Change the processing direction , It will not only affect the order of generating values , It also affects the value itself .
- The name to
- If the name of the iterator begins with
nending , It means that you will limit the number of matches . - If the name of the iterator begins with
_indicesending , Represents the byte offset that will produce them in the slice , And usually iteratable values .
15.3.6 - Mode of searching text
Pattern (pattern):
- When the standard library function needs to search (search)、 matching (match)、 Division (split) Or trim (trim) When text , Will receive different types of parameters , To indicate what to look for . These types are called patterns .
- Patterns can be implemented
std::str::PatternAny type of special type .
The standard library supports 4 There are two main models :
charUsed as a pattern to match characters ;String、&stror&&strAs a model , Used to match substrings equal to patterns .FnMut(char) -> boolClosures as patterns , Used to match closure returnstrueA character of .&[char]As a model , ExpresscharValue slice , Used to match any character that appears in the list .let code = "\t funcation noodle() { "; assert_eq!(code.trim_left_matchs(&[' ', 't'] as &[char]), "function noodle() { ");asThe operator , You can convert character array literals to&[char];&[char; n]Indicates a fixed sizenArray type of , Not a pattern type .&[' ', 't'] as &[char]Can also write&\[' ', '\t'][..].
15.3.7 - Search and replace
slice.contains(pattern): staysliceInclude andpatternWhen the content matchestrue.slice.starts_with(pattern)andslice.ends_with(pattern): staysliceThe initial or final text of andpatternReturn... When matchingtrue.assert!("2017".starts_with(char::is_numeric));slice.find(pattern)andslice.rfind(pattern): staysliceInclude matchpatternWhen , returnSome(i).iIs the byte offset of the match .slice.replace(pattern, replacement): Return toreplacementReplace allpatternNew after the content ofString.slice.replacen(pattern, replacement, n): The function is the same as above , But at most before replacementnMatches .
15.3.8 - Iterative text
slice.chars(): be based onsliceThe character of returns an iterator .slice.char_indices(): be based onsliceThe characters of and their byte offsets return an iterator .assert_eq!("elan".char_indices().collect::<Vec<_>>(), vec![(0, 'e'), (2, 'l'), (3, 'a'), (4, 'n')]);slice.bytes(): be based onsliceIndividual bytes in the return an iterator , expose UTF-8 code .assert_eq!("elan".bytes().collect::<Vec<_>>(), vec![b'e', b'l', b'a', b'n']);slice.lines(): be based onsliceText lines in , Returns an iterator . The terminator of each line is\nor\r\n. The value generated by this iterator is fromsliceBorrowed&str. also , The resulting value does not contain a terminator .slice.split(pattern): Based onpatternDivisionsliceThe resulting part returns an iterator . Two adjacent matches or withslicestart 、 Any match at the end will return an empty string .slice.rsplit(pattern): The function is the same as above , But it will scan and match from back to frontslice.slice.split_terminator(pattern)andslice.rsplit_terminator(pattern): The function is the same as the above two methods , howeverpatternBe regarded as terminator , Instead of the separator . IfpatternIt just matchessliceOn both sides of the road , Then the iterator will not generate an empty slice representing an empty string between the two ends of the match and slice .slice.splitn(n, pattern)andslice.rsplitn(n, pattern): Andsplitandrsplitsimilar , But at most, the string is divided intonA slice , frompatternOf the 1 Match times ton-1Secondary match .slice.split_whitespace(): Based on blanksliceThe separated part returns an iterator . Consecutive white space characters are used as a separator . The blank space at the end will be ignored . The blank space here is similar tochar::is_whitespaceConsistent with the description in .slice.matches(pattern)andslice.rmatches(pattern): be based onpatternstaysliceThe match found in returns an iterator .slice.match_indices(pattern)andslice.rmatch_indices(pattern): Same as above . But the resulting value is(offset, match)Yes , amongoffsetIs the byte offset that matches the start position ,matchIs the matching slice .
15.3.9 - trim
- trim (trim) character string :
- Remove the contents from the beginning and end of the string ( Usually blank ).
- It is often used to clean up indented text read in files , Or an unexpected white space at the end of a line , In order to make the results clearer
slice.trim(): returnsliceSub slice of , Do not include whitespace at the beginning and end of the slice .slice.trim_left(): Only white space at the beginning of the slice is ignored .slice.trim_right(): Only white space at the end of the slice is ignored .slice.trim_matches(pattern): returnsliceSub slice of , Does not include slice start and end matchespatternThe content of .slice.trim_left_match(pattern): Only match the contents at the beginning of the slice .slice.trim_right_match(pattern): Only match the contents at the end of the slice .
15.3.10 - String case conversion
slice.to_uppercase(): Return the newly matched string , It saves the after conversion to uppercasesliceText . The length of the result is not necessarily the same assliceidentical .slice.to_lowercase(): Similar to the above , But the conversion is after lowercasesliceText .
15.3.11 - Resolve other types from characters
All common types implement
std::str::FromStrSpecial type , Has a standard method of parsing values from string slices .pub trait FromStr: Sized { type Err; fn from_str(s: &str) -> Result<Self, self::Err>; }Used to store IPv4 or IPv6 Enumeration of Internet addresses (enum) type
std::net::IpAddrIt has also been realized.FromStr.use std::net::IpAddr; let address = IpAddr::from_str("fe80::0000:3ea9:f4ff:fe34:7a50")?; assert_eq!(address, IpAddr::from([0xfe80, 0, 0, 0, 0x3ea9, 0xf4ff, 0xfe34, 0x7a50]));String sliced
parseMethod , Slices can be resolved to any type . In the call , You need to write the given type .let address = "fe80::0000:3ea9:f4ff:fe34:7a50".parse::<IpAddr>()?;
15.3.12 - Convert other types to strings
Realized
std::fmt::DisplaySpecial print type , Can be informat!Used in macros{}Format specifier .- For smart pointer types , If
TRealizedDisplay, beBox<T>、Rc<T>andArc<T>It's going to happen : The form they print out is the form they reference the target . VecandHashMapWait until the container is not implementedDisplay.
- For smart pointer types , If
If a type implements
Display, Then the standard library will automatically implementstd::str::ToStringSpecial type :- The only way to this special type
to_string. - For custom types, it is recommended to implement
Display, instead ofToString.
- The only way to this special type
The common types of the standard library are implemented
std::fmt::DebugSpecial type :You can receive a value and format it as a string , For program debugging .
DebugThe generated string , Can useformat!broad{:?}Format specifier print .Custom types can also be implemented
Debug, It is recommended to use derived features :#[derive(Copy, Clone, Debug)] struct Complex { r: f64, i: f64 }
15.3.13 - Borrow as other text types —— Borrowing of slices
- Slicing and
StringRealizedAsRef<str>、AsRef<[u8]>、AsRef<Path>andAsRef<OsStr>: Use these features as bindings for your own parameter types , You can pass slices or strings directly to them , In time, these functions need other types . - Slicing and
StringIt has also been realized.std::borrow::Borrow<Str>Special type :HashMapandBTreeMapUseBorrowGive WayStringIt can be used as a key in the table .
15.3.14 - visit UTF-8 Formatted text ( Text represented by bytes )
slice.as_bytes(): To borrowsliceBytes of as&[u8]. The bytes obtained must be well formed UTF-8.string.into_bytes(): obtainStringAnd return bytes of this string by valueVec<u8>. The bytes obtained may not be well formed UTF-8.
15.3.15 - from UTF-8 Data produces text
str::from_utf8(byte_slice): Receive one&[u8]Byte slice , Return to oneResult: Ifbyte_sliceInclude well formed UTF-8, Then return toOk(&str), Otherwise, an error is returned .String::from_utf8(vec): Based on incomingVec<u8>Value to construct a string .If
vecWell formed UTF-8,from_utf8Just go back toOk(string), amongstringIt's about gettingvecownership , And use it as a buffered string .If bytes are not well formed UTF-8, Then return to
Err(e), amongeIt's aFromUtf8ErrorWrong value . If you calle.into_bytes()Then you will get the original vectorvec, The conversion fails without losing the original value .let good_utf8: Vec<u8> = vec![0xe9, 0x8c, 0x86]; let bad_utf8: Vec<u8> = vec![0x9f, 0xf0, 0xa6, 0x80]; let result = String::from_utf8(bad_utf8); // Failure assert!(result.is_err()); assert_eq!(result.unwrap_err().into_bytes(), vec![0x9f, 0xf0, 0xa6, 0x80]);
String::from_utf8_lossy(byte_slice): Byte based shared slices&[u8]Construct aStringor&str.String::from_utf8_unchecked: takeVec<>u8Package as aStringAnd back to it , Requirements must be well formed UTF-8. Only inunsafeBlock the use of .str::from_utf8_unchecked: Receive one&[u8], And return it as a&str, Also, it will not check whether the format of bytes is well formed UTF-8. The same can only be done inunsafeBlock the use of .
15.3.16 - Block allocation
fn get_name() -> String {
std::env::var("USER").unwrap_or("whoever you are".to_string())
}
println!("Greetings, {}!", get_name());
The above example realizes the program of greeting users , stay Unix Can be realized on , But in Windows The user name on is
USERNAMEField , Unable to get the user name of the system .std::env::varThe function returnsString. andget_nameAll types may be returnedString, It could be&'static str'.therefore , have access to
std::borrow::Cow(Clone-on-write Clone on write ) Type implementation , All types of data can be saved , You can also save borrowed data .use std::borrow::Cow; fn get_name() -> Cow<'static, str> { std::env::var("USER") .map(|v| Cow::Owned(v)) .unwrap_or(Cow::Borrowed("whoever you are")) } println!("Greetings, {}!", get_name());- If the read is successful
USERenvironment variable , bemapTake the obtained string asCow::Ownedreturn . - If you fail ,
unwrap_orMake it static&strAsCow::Borrowedreturn . - as long as
TRealizedstd::fmt::DisplaySpecial type , thatCow<'a, T>Will get and displayTThe same result .
- If the read is successful
std::borrow::CowOften used in situations where , Or you may not need to modify a borrowed text .When there is no need to modify , You can continue to borrow it ;
CowOfto_mutMethod , Make sureCowyesCow::Owned, Values will be applied when necessaryToOwnedRealization , Then return a modifiable reference to this value .fn get_title() -> Option<&'static str> { ... } let mut name = get_name(); if let Some(title) = get_title() { name.to_mut().push_str(", "); name.to_mut().push_str(title); } println!("Greetrings, {}!", name);At the same time, memory can be allocated only when necessary .
The standard library is
Cow<'a, str>Provides special support for strings . If provided fromStringand&strOfFromandIntotransformation , So the aboveget_nameI could just write it as :fn get_name() -> Cow<'static, str> { std::env::var("USER") .map(|v| v.into()) .unwrap_or("whoever you are".into()) }Cow<'a, str>It has also been realized.std::ops::Addandstd::ops::AddAssignString overload , thereforeget_title()Judgment can be abbreviated as :if let Some(title) = get_title() { name += ", "; name += title; }because
StringIt can be used aswrite!Macro's goal , Therefore, the above code is also equivalent to :use std::fmt::Write; if let Some(title) = get_title() { write!(name.to_mut(), ", {}", title).unwrap(); }Not all
Cow<..., str>It has to be'staticLife span , Before copying , Can be used all the timeCowBorrow the text calculated before .
15.3.17 - Strings as generic collections
StringRealizedstd::default::Defaultandstd::iter::Extend::defaultdefaultReturns an empty string .extendYou can append characters to the end of a string 、 String slice or string .
&strTypes also implementDefault- Returns an empty slice .
- Often used in some boundary situations . For example, derive from a structure containing string slices
Default.
See 《Rust Programming 》( Jim - Brandy 、 Jason, - By orendov , Translated by lisongfeng ) Chapter 17
Original address
边栏推荐
- Dynamic planning solution ideas and summary (30000 words)
- 1041 Be Unique
- Règlement sur la sécurité des réseaux dans les écoles professionnelles secondaires du concours de compétences des écoles professionnelles de la province de Guizhou en 2022
- Multi screen computer screenshots will cut off multiple screens, not only the current screen
- Control unit
- shared_ Repeated release heap object of PTR hidden danger
- Flutter Web 硬件键盘监听
- Sword finger offer 06 Print linked list from beginning to end
- liunx启动redis
- Full Permutation Code (recursive writing)
猜你喜欢

Codeforces round 712 (Div. 2) d. 3-coloring (construction)

1.13 - RISC/CISC

F - Two Exam(AtCoder Beginner Contest 238)

On the characteristics of technology entrepreneurs from Dijkstra's Turing Award speech
![R language [import and export of dataset]](/img/5e/a15ab692a6f049f846024c98820fbb.png)
R language [import and export of dataset]

How to adjust bugs in general projects ----- take you through the whole process by hand

Sword finger offer 35 Replication of complex linked list

API related to TCP connection

Smart construction site "hydropower energy consumption online monitoring system"

智慧工地“水电能耗在线监测系统”
随机推荐
1.13 - RISC/CISC
Common optimization methods
Bit mask of bit operation
Educational codeforces round 109 (rated for Div. 2) C. robot collisions D. armchairs
Codeforces Round #716 (Div. 2) D. Cut and Stick
【Rust 笔记】14-集合(上)
Codeforces Round #732 (Div. 2) D. AquaMoon and Chess
Sword finger offer 35 Replication of complex linked list
Smart construction site "hydropower energy consumption online monitoring system"
leetcode-6111:螺旋矩阵 IV
2022年貴州省職業院校技能大賽中職組網絡安全賽項規程
RGB LED infinite mirror controlled by Arduino
Daily question 1984 Minimum difference in student scores
2017 USP Try-outs C. Coprimes
[jailhouse article] performance measurements for hypervisors on embedded ARM processors
Codeforces Round #715 (Div. 2) D. Binary Literature
leetcode-556:下一个更大元素 III
Sword finger offer 53 - I. find the number I in the sorted array
shared_ Repeated release heap object of PTR hidden danger
EOJ 2021.10 E. XOR tree