当前位置:网站首页>[rust notes] 15 string and text (Part 1)
[rust notes] 15 string and text (Part 1)
2022-07-05 06:05:00 【phial03】
15 - Strings and text
15.1-Unicode
15.1.1-ASCII、Latin-1 And Unicode
Unicode And ASCII All of the ASCII The code points are the same (0 ~ 0x7f).
Unicode take (0 ~ 0x7f) The range of code points is called Latin-1 Code block (
ISO/IEC 8859-1).namely Unicode yes Latin-1 Superset :
Latin-1 Convert to Unicode:
fn latin1_to_char(latin1: u8) -> char { latin1 as char }Unicode Convert to Latin-1:
fn char_to_latin1(c: char) -> Option<u8> { if c as u32 <= 0xff { Some(c as u8) } else { None } }
15.1.2-UTF-8
- Rust Of
StringandstrType used UTF-8 The encoding format represents the text .UTF-8 Encode characters as 1 To 4 A sequence of bytes . - UTF-8 Format restrictions on sequences :
- For a given code point , Only the shortest encoding is considered well formed , I can't use 4 To encode bytes, just 3 A code point of bytes .
- Well formed UTF-8 Not right 0xd800 ~ 0xdfff, And greater than 0x10ffff Numerical code of .
- UTF-8 Important attributes of :
- UTF-8 Code matching point 0 To 0x7f The encoding of is bytes 0 To 0x7f, preservation ASCII Bytes of text are the most efficient UTF-8.ASCII And UTF-8 It's reversible , and Latin-1 And UTF-8 Not reversible .
- By observing the first few bits of any byte , You can know that it is some characters UTF-8 The first byte of the encoding , Or the middle byte .
- By encoding the first few bits of the first byte, you can know the total length of the encoding .
- The maximum encoding length is 4 Bytes ,UTF-8 There is no need for infinite loops , It can be used to process untrusted data .
- Well formed UTF-8 Kind of , You can quickly point out the start and end positions of character encoding .UTF-8 The first byte of is obviously different from the following bytes .
15.1.5 - Text directionality
- Some words are written from left to right : It belongs to the normal way of writing or reading , It's also Unicode The order in which characters are stored .
- Some words are written from right to left : The first byte of a string stores the encoding of the character to be written on the far right .
15.2 - character (char)
charThe type is save Unicode Code point 32 A value .- The scope is :0 To 0xd7ff, perhaps 0xe000 To 0x10ffff.
charType implementsCopyandClone, And comparison 、 hash 、 All common features of the format .
15.2.1 - Character classification —— Methods of detecting character categories
ch.is_numeric(): Numeric character , Include Unicode General categoryNumber; digitandNumber; letter, But does not includeNumber; other.ch.is_alphabetic(): Alphabetic character , Include Unicode Of “Alphabetic” Derived properties .ch.is_alphanumeric(): Numeric or alphabetic characters , Including the above two categories .ch.is_whitespace(): Blank character , Include Unicode Character properties “WSpace=Y”.ch.is_control: Control characters , Include Unicode OfOther, controlGeneral category .
15.2.2 - Deal with numbers
ch.to_digit(radix): decisionchWhether the cardinality isradixOf ASCII Numbers . If so, go backSome(num), amongnumyesu32. otherwise , returnNone.radixThe range is 2~36. Ifradix > 10, that ASCII The letter will be taken as the value 10~35 The number of .std::char::from_digit(num, radix): holdu32The numbernumConvert tochar. Ifradix > 10,chIt's lowercase .ch.is_digit(radix): staychIs based onradixUnder the ASCII Digital hour , returntrue. Equivalent toch.to_digit(radix) != None.
15.2.3 - Character case conversion
ch.is_lowercase(): JudgechLowercase or not .ch.is_uppercase(): JudgechIs it a capital letter .ch.to_lowercase(): takechConvert to lowercase .ch.to_uppercase(): takechConvert to uppercase .
15.2.4 - Convert with integer
asOperators can putcharConvert to any integer type , The high bit will be shielded .asOperators can put anyu8Value tochar.charTypes also implementFrom<u8>. Recommendedstd::char::from_u32, returnOption<char>.
15.3-String And str
Rust Of
StringandstrType only saves well formed UTF-8.StringType can create scalable buffers , To hold strings . Essential forVec<u8>Type of packaging .strType is to manipulate string text in place .StringThe dereference of is&str.strAll methods defined on , Can be inStringCall directly on .The text processing method indexes the text according to the byte offset , Length is also measured in bytes , Not by character .
Rust According to the name of the variable , Guess its type , Such as :
Variable name Guess the type stringStringslice&strOr dereference as&strThe type of , Such asStringorRc<String>chcharnusize, lengthi, jusize, Byte offsetrangeusizeByte offset range , It may be fully qualifiedi..j, Partially Limitedi..or..j, Or infinite..patternAny mode type : char, String, &str, &[char], FnMut(char) -> bool
15.3.1 - establish String value
String::new(): Return a new empty string . There is no buffer allocated on the heap , Subsequently, it will be allocated as needed .String::with_capacity(n): Return a new empty string , At the same time, allocate at least... On the heapnByte buffer .slice.to_string(): It is often used to create by string literalsString. Assign a brand newString, The content issliceCopy of .iter.collect(): By splicing all items of the iterator (char、&strorStringvalue ) To buildString. The following is an example of deleting spaces in a string :let spacey = "man hat tan"; let spaceless: String = spacey.chars().filter(|c| !c.is_whitespcae()).collect(); assert_eq!(spaceless, "manhattan");slice.to_owned(): takesliceCopy of as a new distributionStringreturn .&strType cannot implementClone, This method can achieve the effect of cloning .
15.3.2 - Simple check —— Get basic information from string slices
slice.len(): Returns in bytessliceThe length of .slice.is_empyt(): stayslice.len() == 0When to return totrue.slice[range]: Return to borrowingsliceSlice the specified part of .Can not be like
slice[i]In this format, get a string slice of location index . Instead, you need to generate a based on slices chars iterator , Let the iterator parse the corresponding string UTF-8:let par = "rust he"; assert_eq!(par[6..].chars().next(), Some('e'));slice.split_at(i): Return fromsliceBorrowed tuples of two shared slices ,slice[..i]andslice[i..].slice.is_char_boundary(i): stayiReturnstrue.Slices can be compared equally 、 Order and hash .
15.3.3 - towards String Append and insert text
string.push(ch): Alphabet characterchAppend to the end of the string .string.push_str(slice): AdditionalsliceThe whole content of .string.extend(iter): Put the iteratoriterAll items generated are appended to the string . Iterators can generatechar、strorStringvalue .string.insert(i, ch): In byte offset valueiThe location of , Insert the character... Into the stringch.iAll subsequent characters are moved back one bit .string.insert_str(i, slice): In byte offset valueiThe location of , Insert... Into the stringsliceThe whole content of .StringRealizedstd::fmt::Write, So you can usewrite!andwriteln!macro , toStringAppend formatted text . Their return value type isResult. Need to add at the end?Operator to handle errors .use std::fmt::Write; let mut letter = String::new(); writeln!(letter, "Whose {} these are I think I know", "rustabagas")?;+The operator : When the operand is a string , It can be used for string splicing .
15.3.4 - Delete text
string.shrink_to_fit(): After deleting the string contents , Can be used to free memory .string.clear(): Reset the string to empty characters .string.truncate(n): Discard byte offset valuesnAll the characters after .string.pop(): Remove the last character from the string , AndOption<char>As return value .string.remove(i): Delete byte offset value from stringiWhere the character is , And return the character , The following characters will move forward .string.drain(range): According to the return of Godin byte index , Return iterator , And delete the corresponding characters when the iterator is cleared .
15.3.5 - The Convention of search and iteration
Rust Standard library functions related to searching and iterating text , Follow the following naming convention :
- Most operations can process text from left to right ;
- The name to
rThe first operation is handled from right to left , Such asrsplitandsplitThe opposite operation of . - Change the processing direction , It will not only affect the order of generating values , It also affects the value itself .
- The name to
- If the name of the iterator begins with
nending , It means that you will limit the number of matches . - If the name of the iterator begins with
_indicesending , Represents the byte offset that will produce them in the slice , And usually iteratable values .
15.3.6 - Mode of searching text
Pattern (pattern):
- When the standard library function needs to search (search)、 matching (match)、 Division (split) Or trim (trim) When text , Will receive different types of parameters , To indicate what to look for . These types are called patterns .
- Patterns can be implemented
std::str::PatternAny type of special type .
The standard library supports 4 There are two main models :
charUsed as a pattern to match characters ;String、&stror&&strAs a model , Used to match substrings equal to patterns .FnMut(char) -> boolClosures as patterns , Used to match closure returnstrueA character of .&[char]As a model , ExpresscharValue slice , Used to match any character that appears in the list .let code = "\t funcation noodle() { "; assert_eq!(code.trim_left_matchs(&[' ', 't'] as &[char]), "function noodle() { ");asThe operator , You can convert character array literals to&[char];&[char; n]Indicates a fixed sizenArray type of , Not a pattern type .&[' ', 't'] as &[char]Can also write&\[' ', '\t'][..].
15.3.7 - Search and replace
slice.contains(pattern): staysliceInclude andpatternWhen the content matchestrue.slice.starts_with(pattern)andslice.ends_with(pattern): staysliceThe initial or final text of andpatternReturn... When matchingtrue.assert!("2017".starts_with(char::is_numeric));slice.find(pattern)andslice.rfind(pattern): staysliceInclude matchpatternWhen , returnSome(i).iIs the byte offset of the match .slice.replace(pattern, replacement): Return toreplacementReplace allpatternNew after the content ofString.slice.replacen(pattern, replacement, n): The function is the same as above , But at most before replacementnMatches .
15.3.8 - Iterative text
slice.chars(): be based onsliceThe character of returns an iterator .slice.char_indices(): be based onsliceThe characters of and their byte offsets return an iterator .assert_eq!("elan".char_indices().collect::<Vec<_>>(), vec![(0, 'e'), (2, 'l'), (3, 'a'), (4, 'n')]);slice.bytes(): be based onsliceIndividual bytes in the return an iterator , expose UTF-8 code .assert_eq!("elan".bytes().collect::<Vec<_>>(), vec![b'e', b'l', b'a', b'n']);slice.lines(): be based onsliceText lines in , Returns an iterator . The terminator of each line is\nor\r\n. The value generated by this iterator is fromsliceBorrowed&str. also , The resulting value does not contain a terminator .slice.split(pattern): Based onpatternDivisionsliceThe resulting part returns an iterator . Two adjacent matches or withslicestart 、 Any match at the end will return an empty string .slice.rsplit(pattern): The function is the same as above , But it will scan and match from back to frontslice.slice.split_terminator(pattern)andslice.rsplit_terminator(pattern): The function is the same as the above two methods , howeverpatternBe regarded as terminator , Instead of the separator . IfpatternIt just matchessliceOn both sides of the road , Then the iterator will not generate an empty slice representing an empty string between the two ends of the match and slice .slice.splitn(n, pattern)andslice.rsplitn(n, pattern): Andsplitandrsplitsimilar , But at most, the string is divided intonA slice , frompatternOf the 1 Match times ton-1Secondary match .slice.split_whitespace(): Based on blanksliceThe separated part returns an iterator . Consecutive white space characters are used as a separator . The blank space at the end will be ignored . The blank space here is similar tochar::is_whitespaceConsistent with the description in .slice.matches(pattern)andslice.rmatches(pattern): be based onpatternstaysliceThe match found in returns an iterator .slice.match_indices(pattern)andslice.rmatch_indices(pattern): Same as above . But the resulting value is(offset, match)Yes , amongoffsetIs the byte offset that matches the start position ,matchIs the matching slice .
15.3.9 - trim
- trim (trim) character string :
- Remove the contents from the beginning and end of the string ( Usually blank ).
- It is often used to clean up indented text read in files , Or an unexpected white space at the end of a line , In order to make the results clearer
slice.trim(): returnsliceSub slice of , Do not include whitespace at the beginning and end of the slice .slice.trim_left(): Only white space at the beginning of the slice is ignored .slice.trim_right(): Only white space at the end of the slice is ignored .slice.trim_matches(pattern): returnsliceSub slice of , Does not include slice start and end matchespatternThe content of .slice.trim_left_match(pattern): Only match the contents at the beginning of the slice .slice.trim_right_match(pattern): Only match the contents at the end of the slice .
15.3.10 - String case conversion
slice.to_uppercase(): Return the newly matched string , It saves the after conversion to uppercasesliceText . The length of the result is not necessarily the same assliceidentical .slice.to_lowercase(): Similar to the above , But the conversion is after lowercasesliceText .
15.3.11 - Resolve other types from characters
All common types implement
std::str::FromStrSpecial type , Has a standard method of parsing values from string slices .pub trait FromStr: Sized { type Err; fn from_str(s: &str) -> Result<Self, self::Err>; }Used to store IPv4 or IPv6 Enumeration of Internet addresses (enum) type
std::net::IpAddrIt has also been realized.FromStr.use std::net::IpAddr; let address = IpAddr::from_str("fe80::0000:3ea9:f4ff:fe34:7a50")?; assert_eq!(address, IpAddr::from([0xfe80, 0, 0, 0, 0x3ea9, 0xf4ff, 0xfe34, 0x7a50]));String sliced
parseMethod , Slices can be resolved to any type . In the call , You need to write the given type .let address = "fe80::0000:3ea9:f4ff:fe34:7a50".parse::<IpAddr>()?;
15.3.12 - Convert other types to strings
Realized
std::fmt::DisplaySpecial print type , Can be informat!Used in macros{}Format specifier .- For smart pointer types , If
TRealizedDisplay, beBox<T>、Rc<T>andArc<T>It's going to happen : The form they print out is the form they reference the target . VecandHashMapWait until the container is not implementedDisplay.
- For smart pointer types , If
If a type implements
Display, Then the standard library will automatically implementstd::str::ToStringSpecial type :- The only way to this special type
to_string. - For custom types, it is recommended to implement
Display, instead ofToString.
- The only way to this special type
The common types of the standard library are implemented
std::fmt::DebugSpecial type :You can receive a value and format it as a string , For program debugging .
DebugThe generated string , Can useformat!broad{:?}Format specifier print .Custom types can also be implemented
Debug, It is recommended to use derived features :#[derive(Copy, Clone, Debug)] struct Complex { r: f64, i: f64 }
15.3.13 - Borrow as other text types —— Borrowing of slices
- Slicing and
StringRealizedAsRef<str>、AsRef<[u8]>、AsRef<Path>andAsRef<OsStr>: Use these features as bindings for your own parameter types , You can pass slices or strings directly to them , In time, these functions need other types . - Slicing and
StringIt has also been realized.std::borrow::Borrow<Str>Special type :HashMapandBTreeMapUseBorrowGive WayStringIt can be used as a key in the table .
15.3.14 - visit UTF-8 Formatted text ( Text represented by bytes )
slice.as_bytes(): To borrowsliceBytes of as&[u8]. The bytes obtained must be well formed UTF-8.string.into_bytes(): obtainStringAnd return bytes of this string by valueVec<u8>. The bytes obtained may not be well formed UTF-8.
15.3.15 - from UTF-8 Data produces text
str::from_utf8(byte_slice): Receive one&[u8]Byte slice , Return to oneResult: Ifbyte_sliceInclude well formed UTF-8, Then return toOk(&str), Otherwise, an error is returned .String::from_utf8(vec): Based on incomingVec<u8>Value to construct a string .If
vecWell formed UTF-8,from_utf8Just go back toOk(string), amongstringIt's about gettingvecownership , And use it as a buffered string .If bytes are not well formed UTF-8, Then return to
Err(e), amongeIt's aFromUtf8ErrorWrong value . If you calle.into_bytes()Then you will get the original vectorvec, The conversion fails without losing the original value .let good_utf8: Vec<u8> = vec![0xe9, 0x8c, 0x86]; let bad_utf8: Vec<u8> = vec![0x9f, 0xf0, 0xa6, 0x80]; let result = String::from_utf8(bad_utf8); // Failure assert!(result.is_err()); assert_eq!(result.unwrap_err().into_bytes(), vec![0x9f, 0xf0, 0xa6, 0x80]);
String::from_utf8_lossy(byte_slice): Byte based shared slices&[u8]Construct aStringor&str.String::from_utf8_unchecked: takeVec<>u8Package as aStringAnd back to it , Requirements must be well formed UTF-8. Only inunsafeBlock the use of .str::from_utf8_unchecked: Receive one&[u8], And return it as a&str, Also, it will not check whether the format of bytes is well formed UTF-8. The same can only be done inunsafeBlock the use of .
15.3.16 - Block allocation
fn get_name() -> String {
std::env::var("USER").unwrap_or("whoever you are".to_string())
}
println!("Greetings, {}!", get_name());
The above example realizes the program of greeting users , stay Unix Can be realized on , But in Windows The user name on is
USERNAMEField , Unable to get the user name of the system .std::env::varThe function returnsString. andget_nameAll types may be returnedString, It could be&'static str'.therefore , have access to
std::borrow::Cow(Clone-on-write Clone on write ) Type implementation , All types of data can be saved , You can also save borrowed data .use std::borrow::Cow; fn get_name() -> Cow<'static, str> { std::env::var("USER") .map(|v| Cow::Owned(v)) .unwrap_or(Cow::Borrowed("whoever you are")) } println!("Greetings, {}!", get_name());- If the read is successful
USERenvironment variable , bemapTake the obtained string asCow::Ownedreturn . - If you fail ,
unwrap_orMake it static&strAsCow::Borrowedreturn . - as long as
TRealizedstd::fmt::DisplaySpecial type , thatCow<'a, T>Will get and displayTThe same result .
- If the read is successful
std::borrow::CowOften used in situations where , Or you may not need to modify a borrowed text .When there is no need to modify , You can continue to borrow it ;
CowOfto_mutMethod , Make sureCowyesCow::Owned, Values will be applied when necessaryToOwnedRealization , Then return a modifiable reference to this value .fn get_title() -> Option<&'static str> { ... } let mut name = get_name(); if let Some(title) = get_title() { name.to_mut().push_str(", "); name.to_mut().push_str(title); } println!("Greetrings, {}!", name);At the same time, memory can be allocated only when necessary .
The standard library is
Cow<'a, str>Provides special support for strings . If provided fromStringand&strOfFromandIntotransformation , So the aboveget_nameI could just write it as :fn get_name() -> Cow<'static, str> { std::env::var("USER") .map(|v| v.into()) .unwrap_or("whoever you are".into()) }Cow<'a, str>It has also been realized.std::ops::Addandstd::ops::AddAssignString overload , thereforeget_title()Judgment can be abbreviated as :if let Some(title) = get_title() { name += ", "; name += title; }because
StringIt can be used aswrite!Macro's goal , Therefore, the above code is also equivalent to :use std::fmt::Write; if let Some(title) = get_title() { write!(name.to_mut(), ", {}", title).unwrap(); }Not all
Cow<..., str>It has to be'staticLife span , Before copying , Can be used all the timeCowBorrow the text calculated before .
15.3.17 - Strings as generic collections
StringRealizedstd::default::Defaultandstd::iter::Extend::defaultdefaultReturns an empty string .extendYou can append characters to the end of a string 、 String slice or string .
&strTypes also implementDefault- Returns an empty slice .
- Often used in some boundary situations . For example, derive from a structure containing string slices
Default.
See 《Rust Programming 》( Jim - Brandy 、 Jason, - By orendov , Translated by lisongfeng ) Chapter 17
Original address
边栏推荐
- Fried chicken nuggets and fifa22
- Simply sort out the types of sockets
- Graduation project of game mall
- Convolution neural network -- convolution layer
- Personal developed penetration testing tool Satania v1.2 update
- Codeforces Round #716 (Div. 2) D. Cut and Stick
- CCPC Weihai 2021m eight hundred and ten thousand nine hundred and seventy-five
- Sword finger offer 05 Replace spaces
- liunx启动redis
- 【Rust 笔记】13-迭代器(中)
猜你喜欢

可变电阻器概述——结构、工作和不同应用
![R language [import and export of dataset]](/img/5e/a15ab692a6f049f846024c98820fbb.png)
R language [import and export of dataset]

The connection and solution between the shortest Hamilton path and the traveling salesman problem

Some common problems in the assessment of network engineers: WLAN, BGP, switch

Appium自动化测试基础 — Appium测试环境搭建总结

Solution to game 10 of the personal field

LaMDA 不可能觉醒吗?

【Jailhouse 文章】Jailhouse Hypervisor

R语言【数据集的导入导出】

EOJ 2021.10 E. XOR tree
随机推荐
【Rust 笔记】15-字符串与文本(上)
Daily question - longest substring without repeated characters
LeetCode 0108.将有序数组转换为二叉搜索树 - 数组中值为根,中值左右分别为左右子树
2022 pole technology communication arm virtual hardware accelerates the development of Internet of things software
【Rust 笔记】14-集合(下)
Flutter Web 硬件键盘监听
Règlement sur la sécurité des réseaux dans les écoles professionnelles secondaires du concours de compétences des écoles professionnelles de la province de Guizhou en 2022
LeetCode 0107.二叉树的层序遍历II - 另一种方法
Collection: programming related websites and books
927. 三等分 模拟
leetcode-6108:解密消息
QQ电脑版取消转义符输入表情
【Rust 笔记】16-输入与输出(下)
Wazuh开源主机安全解决方案的简介与使用体验
CF1634 F. Fibonacci Additions
[jailhouse article] look mum, no VM exits
1039 Course List for Student
对for(var i = 0;i < 5;i++) {setTimeout(() => console.log(i),1000)}的深入分析
Common optimization methods
LaMDA 不可能觉醒吗?