当前位置:网站首页>Regular expressions: Syntax

Regular expressions: Syntax

2022-06-27 02:16:00 Live up to your youth

Regular expressions are made up of ordinary characters ( Such as character a To z) And special characters ( be called " Metacharacters ") The composition of the text pattern . Pattern describes one or more strings to match when searching for text . Regular expression as a template , Match a character pattern to the string being searched .

The way to construct a regular expression is the same as the way to create a mathematical expression . That is, you can combine small expressions with many metacharacters and operators to create larger expressions . A regular expression component can be a single character 、 Character set 、 character in range 、 Choice between characters or any combination of all these components .

Ordinary character

Normal characters include all printable and nonprintable characters that are not explicitly specified as metacharacters . This includes all uppercase and lowercase letters 、 All figures 、 All punctuation and some other symbols .

character describe
x|y matching x or y. for example ,'z
[xyz] Character set . Match any character contained . for example ,“[abc]” matching “plain” Medium “a”.
[^xyz] Reverse character set . Match any characters that are not included . for example ,“[^abc]” matching “plain” Medium “p”.
[a-z] character in range . Matches any character in the specified range . for example ,“[a-z]” matching “a” To “z” Any lowercase letter in the range .
[^a-z] Reverse range character . Matches any characters that are not in the specified range . for example ,“[^a-z]” Match any not in “a” To “z” Any character in the range .
\d Number character matching . Equivalent to [0-9].
\D Non numeric character matching . Equivalent to [^0-9].
\w Match any word character , Include underline . And “[A-Za-z0-9_]” equivalent .
\W Match any non word character . And “[^A-Za-z0-9_]” equivalent .
\xn matching n, Here n Is a hex escape code . Hex escape code must be exactly two digits long . for example ,“\x41” matching “A”.“\x041” And “\x04”&“1” equivalent . Allow in regular expressions ASCII Code .
\num matching num, Here num Is a positive integer . Reverse reference to capture match . for example ,“(.)\1” Match two consecutive identical characters .
\n Identifies an octal escape code or reverse reference . If \n At least in front n Capture subexpressions , that n Is a reverse reference . otherwise , If n Is an octal number (0-7), that n Is octal escape code .
\nm Identifies an octal escape code or reverse reference . If \nm At least in front nm Capture subexpressions , that nm Is a reverse reference . If \nm At least in front n Capture , be n Is a reverse reference , Followed by characters m. If neither of the preceding conditions exists , be \nm Match octal value nm, among n and m It's octal (0-7).
\nml When n Is an octal number (0-3),m and l Is an octal number (0-7) when , Match octal escape code nml.
\un matching n, among n Is represented by four hexadecimal numbers Unicode character . for example ,\u00A9 Match copyright symbol ().

Nonprinting characters

Nonprinting characters can also be part of regular expressions .

character describe
\cx Match by x Control characters indicated .
\f Match a page break . Equivalent to ​\x0c​ and ​\cL​.
\n Match a line break . Equivalent to ​\x0a ​ and ​ \cJ​.
\r Match a carriage return . Equivalent to ​\x0d​ and ​\cM​.
\s Matches any whitespace characters , Including Spaces 、 tabs 、 Page breaks and so on . Equivalent to ​[ \f\n\r\t\v]​.
\S Matches any non-whitespace characters . Equivalent to ​ [^ \f\n\r\t\v]​.
\t Match a tab . Equivalent to ​ \x09​ and ​\cI​.
\v Match a vertical tab . Equivalent to ​\x0b​ and ​\cK​.

Special characters

Special characters , Just some characters with special meanings . To match these special characters , You must first make the characters " escape ", namely , Put the backslash character () Put it in front of them .

character describe
$ Matches the end of the input string . If set RegExp Object's Multiline attribute , be $ Also match ‘\n’ or ‘\r’.
( ) Mark the beginning and end of a subexpression . Subexpressions can be obtained for later use . To match these characters , Please use ​(​ and ​)​.
* Match previous subexpression zero or more times . To match * character , Please use ​\ *​.
+ Match previous subexpression one or more times . To match + character , Please use ​\ +​.
. Match break \n Any single character other than . To match ., Please use ​ \ .​.
[ Mark the beginning of a bracket expression . To match [, Please use ​\ [​.
? Match previous subexpression zero or once , Or indicate a non greedy qualifier . To match ? character , Please use ​ \ ?​.
\ Mark next character as or special character 、 Or literal character 、 Or back reference 、 Or octal escape character .
^ Matches the start of the input string , Unless used in a bracket expression , In this case, it means that the character set is not accepted . To match ^ Character itself , Please use ​\ ^​.
{ Mark the beginning of a qualifier expression . To match {, Please use ​ \ {​.
| Indicate a choice between the two . To match |, Please use ​\ |​.

qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match . Yes * or + or ? or {n} or {n,} or {n,m} common 6 Kind of .

character describe
* Match previous subexpression zero or more times . for example ,zo* Can match “z” as well as “zoo”.* Equivalent to {0,}.
+ Match previous subexpression one or more times . for example ,‘zo+’ Can match “zo” as well as “zoo”, But can't match “z”.+ Equivalent to {1,}.
? Match previous subexpression zero or once . for example ,“do(es)?” Can match “do” 、 “does” Medium “does” 、 “doxy” Medium “do” .? Equivalent to {0,1}.
{n}n Is a non negative integer . Matched definite n Time . for example ,‘o{2}’ Can't match “Bob” Medium ‘o’, But it matches “food” Two of them o.
{n,}n Is a non negative integer . Match at least n Time . for example ,‘o{2,}’ Can't match “Bob” Medium ‘o’, But it can match. “foooood” All in o.‘o{1,}’ Equivalent to ‘o+’.‘o{0,}’ Is equivalent to ‘o*’.
{n,m}m and n All non negative integers , among n <= m. Least match n Times and at most m Time . for example ,“o{1,3}” Will match “fooooood” Top three in o.‘o{0,1}’ Equivalent to ‘o?’. Please note that there cannot be spaces between commas and two numbers .

Locator

Locators enable you to fix regular expressions to the beginning or end of a line . Locators are used to describe the boundaries of strings or words ,^ and $ Refers to the beginning and end of a string ,\b Describe the front or back boundary of a word ,\B Indicates a non word boundary .

character describe
^ Matches where the input string starts . If set RegExp Object's Multiline attribute ,^ Also with \n or \r Position matching after .
$ Matches the position of the end of the input string . If set RegExp Object's Multiline attribute ,$ Also with \n or \r Previous position match .
\b Matches a word boundary , That is, the position between words and spaces .
\B Non word boundary matching .

choice

Use parentheses () Enclose all the options , Use... Between adjacent options | Separate .() Represents the capture group ,() The matching values in each group are saved .

Using parentheses can have a side effect , Make the relevant match cached , Available at this time ?: Put the first option forward to eliminate this side effect .
among ?: Is one of the non capture elements , Two other non capture elements are ?= and ?!, These two have more meanings , The former is positive preview , Match the search string at any position that begins to match the regular expression pattern in parentheses , The latter is negative preview , Match the search string at any position that does not initially match the regular expression pattern .

1. exp1(?=exp2): lookup exp2 Ahead exp1.
2. (?<=exp2)exp1: lookup exp2 hinder exp1.
3. exp1(?!exp2): It's not exp2 Of exp1.
4. (?<!exp2)exp1: Not the front look exp2 Of exp1.

backreferences

The simplest way to reverse reference 、 One of the most useful applications , It provides the ability to find a match between two identical adjacent words in the text . A back reference can also refer to a generic resource indicator (URI) Break down into its components .

原网站

版权声明
本文为[Live up to your youth]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/178/202206270200106868.html