当前位置:网站首页>regular expression
regular expression
2022-06-13 02:18:00 【Researcher-Du】
from :https://github.com/ziishaned/learn-regex
What is regular expression ?
A regular expression is a special set of text consisting of letters and symbols , It can be used to find sentences from the text that meet the format you want .
A regular expression is a pattern that matches the body string from left to right .
“Regular expression” This word is rather awkward , We often use abbreviated terms “regex” or “regexp”.
Regular expressions can replace strings in text from a base string according to a certain matching pattern 、 validate form 、 Extract strings and so on .
Imagine you're writing an app , Then you want to set a user naming rule , Let the user name contain the characters 、 Numbers 、 Underscores and hyphens , And limit the number of characters , So that the name doesn't look so ugly .
We use the following regular expression to verify a user name :
The regular expression above can accept john_doe
、jo-hn_doe
、john12_as
.
But don't match Jo
, Because it contains capital letters and it's too short .
Catalog
- 1. Basic match
- 2. Metacharacters
- 3. Shorthand character set
- 4. Zero width assertion ( Pre check before and after )
- 5. sign
- Add extra
- contribution
- license
1. Basic match
Regular expression is actually the format of the search , It's a combination of letters and numbers .
for example : A regular expression the
, It means a rule : By letter t
Start , Next is h
, And then there is e
.
"the" => The fat cat sat on the mat.
Regular expressions 123
Match string 123
. It's character by character compared to the input regular expression .
Regular expressions are case sensitive , therefore The
It doesn't match the
.
"The" => The fat cat sat on the mat.
2. Metacharacters
Regular expressions mainly depend on metacharacters .
Metacharacters do not represent their literal meaning , They all have a special meaning . Some metacharacters have special meanings when written in square brackets . Here's an introduction to some metacharacters :
Metacharacters | describe |
---|---|
. | Period matches any single character except line breaks . |
[ ] | Character type . Match any character in brackets . |
[^ ] | Negative character type . Match any character except in square brackets |
* | matching >=0 A repetition is in * The character before the number . |
+ | matching >=1 A repetition of + The character before the number . |
? | Mark ? The previous characters are optional . |
{n,m} | matching num The character or character set before braces (n <= num <= m). |
(xyz) | Character set , Match with xyz Exactly the same string . |
| | Or operator , Match the characters before or after the symbol . |
\ | Escape character , Used to match some reserved characters [ ] ( ) { } . * + ? ^ $ \ | |
^ | Match from the start line . |
$ | Match from the end . |
2.1 Dot operator .
.
Is the simplest example of metacharacters ..
Match any single character , But don't match newline .
for example , expression .ar
Match an arbitrary character followed by a
and r
String .
".ar" => The car parked in the garage.
2.2 Character set
Character sets are also called character classes .
Square brackets are used to specify a character set .
Use hyphens in square brackets to specify the range of the character set .
The character set in square brackets doesn't care about order .
for example , expression [Tt]he
matching the
and The
.
"[Tt]he" => The car parked in the garage.
A period in square brackets means a period .
expression ar[.]
matching ar.
character string
"ar[.]" => A garage is a good place to park a car.
2.2.1 Negative character set
Generally speaking ^
Represents the beginning of a string , But it's used at the beginning of a square bracket , It means that this character set is negative .
for example , expression [^c]ar
Match one followed by ar
Except c
Any character of .
"[^c]ar" => The car parked in the garage.
2.3 Repeat the number
Followed by metacharacters +
,*
or ?
Of , Used to specify the number of times to match a subpattern .
These metacharacters have different meanings in different situations .
2.3.1 *
Number
*
Number matches stay *
The previous characters appear Greater than or equal to 0
Time .
for example , expression a*
matching 0 Or more a Start character . expression [a-z]*
Match all strings in a row that start with lowercase letters .
"[a-z]*" => The car parked in the garage #21.
*
Characters and .
Character matching can match all characters .*
.*
And symbols that match spaces \s
Use it together , Like the expression \s*cat\s*
matching 0 Or more spaces and 0 Or more spaces cat character string .
"\s*cat\s*" => The fat cat sat on the concatenation.
2.3.2 +
Number
+
Number matches +
The character before the sign appears >=1 Time .
For example, an expression c.+t
Match with initial c
Begin with t
ending , A string followed by at least one character .
"c.+t" => The fat cat sat on the mat.
2.3.3 ?
Number
In regular expressions, metacharacters ?
The characters marked before the symbol are optional , That is to say 0 or 1 Time .
for example , expression [T]?he
Match string he
and The
.
"[T]he" => The car is parked in the garage.
"[T]?he" => The car is parked in the garage.
2.4 {}
Number
In regular expressions {}
It's a quantifier , It is often used to limit the number of times a character or group of characters can be repeated .
for example , expression [0-9]{2,3}
Match the least 2 Most bits 3 position 0~9 The number of .
"[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0.
We can omit the second parameter .
for example ,[0-9]{2,}
Match at least two 0~9 The number of .
"[0-9]{2,}" => The number was 9.9997 but we rounded it off to 10.0.
If the comma is also omitted, it means repeating a fixed number of times .
for example ,[0-9]{3}
matching 3 Digit number
"[0-9]{3}" => The number was 9.9997 but we rounded it off to 10.0.
2.5 (...)
Characteristic groups
A feature group is a group written in (...)
Sub pattern in .(...)
The content contained in will be seen as a whole , And mathematical parentheses ( ) It's the same thing . for example , expression (ab)*
Match appears continuously 0 Or more ab
. If not used (...)
, So the expression ab*
The match appears continuously 0 Or more b
. Like before {}
It is used to indicate the specified number of occurrences of the preceding character . But if {}
And then we add the eigenvalue group (...)
It means that the characters in the whole group are repeated N Time .
We can still do that ()
Chinese or character |
Represents or . for example ,(c|g|p)ar
matching car
or gar
or par
.
"(c|g|p)ar" => The car is parked in the garage.
A recent example of how I handle mass mailing , You need to exchange the last name and first name in the name string :
Xiaodong Chen
Dawei Li
In the east king
stay sublime replace :
(.{1,2}) (.{1,2})
$2 $1
After replacement :
Chen Xiaodong
Li Dawei
Wang Dong
2.6 |
Or operator
Or operator means or , As a condition of judgment .
for example (T|t)he|car
matching (T|t)he
or car
.
"(T|t)he|car" => The car is parked in the garage.
2.7 Transcode special characters
Backslash \
Used in expressions to transcode the characters immediately following . Is used to specify the { } [ ] / \ + * . $ ^ | ?
These special characters . If you want to match these special characters, you need to precede them with a backslash \
.
for example .
It is used to match all characters except line breaks . If you want to match .
It should be written as \.
Here's an example \.?
It's a selective match .
"(f|c|m)at\.?" => The fat cat sat on the mat.
2.8 Anchor point
In regular expressions , To match a string at the beginning or the end of a specified string, you need to use an anchor .^
Specify the beginning ,$
Specify the end .
2.8.1 ^
Number
^
Used to check whether the matching string is at the beginning of the matched string .
for example , stay abc
Using expressions in ^a
You'll get the results a
. But if ^b
Will not match any results . Because in string abc
It's not because of b
start .
for example ,^(T|t)he
Match with The
or the
Starting string .
"(T|t)he" => The car is parked in the garage.
"^(T|t)he" => The car is parked in the garage.
example : It can be used “^23” Match the following lines with 23 Starting string
123
23
+233
a23
2.8.2 $
Number
In the same way in ^
Number ,$
The number is used to match whether the character is the last .
for example ,(at\.)$
Match with at.
a null-terminated string .
"(at\.)" => The fat cat. sat. on the mat.
"(at\.)$" => The fat cat. sat. on the mat.
3. Shorthand character set
Regular expressions provide some common shorthand for character sets . as follows :
Abbreviation | describe |
---|---|
. | All characters except line breaks |
\w | Match all alphanumeric , Equate to [a-zA-Z0-9_] |
\W | Match all non alphanumeric , That's the symbol , Equate to : [^\w] |
\d | Match the Numbers : [0-9] |
\D | Match non numeric : [^\d] |
\s | Match all space characters , Equate to : [\t\n\f\r\p{Z}] |
\S | Match all non whitespace characters : [^\s] |
\f | Match a page break |
\n | Match a line break |
\r | Match a carriage return |
\t | Match a tab |
\v | Match a vertical tab |
\p | matching CR/LF( Equate to \r\n ), To match DOS Line terminator |
4. Zero width assertion ( Pre check before and after )
Both the first assertion and the last assertion belong to Non capture cluster ( Don't capture text , And it doesn't count against the combination meter ).
Antecedent assertion is used to determine whether the matched format is prior to another determined format , The matching result does not contain the determined format ( As a constraint only ).
for example , We want to get all the following $
The number after the symbol , We can use positive post assertion (?<=\$)[0-9\.]*
.
This expression matches $
start , Followed by 0,1,2,3,4,5,6,7,8,9,.
These characters can appear greater than or equal to 0 Time .
The zero width assertion is as follows :
Symbol | describe |
---|---|
?= | Is asserting first - There is |
?! | Negative antecedent assertion - exclude |
?<= | Just after the assertion - There is |
?<! | Negative post assertion - exclude |
4.1 ?=...
Is asserting first
?=...
Is asserting first , The first part of the expression must be followed by ?=...
Defined expression .
The returned result contains only the first part of the expression that meets the matching criteria .
Define a forward assertion to use ()
. Use a question mark and an equal sign inside the brackets : (?=...)
.
What is asserted in advance is written after the equal sign in brackets .
for example , expression (T|t)he(?=\sfat)
matching The
and the
, In parentheses, we define forward assertion (?=\sfat)
, namely The
and the
Followed closely by ( Space )fat
.
"(T|t)he(?=\sfat)" => The fat cat sat on the mat.
4.2 ?!...
Negative antecedent assertion
Negative antecedent assertion ?!
Used to filter all matching results , The screening criteria are Not followed by the format defined in the assertion . Is asserting first
Definitions and Negative antecedent assertion
equally , The difference is =
Replace with !
That is to say (?!...)
.
expression (T|t)he(?!\sfat)
matching The
and the
, And don't follow ( Space )fat
.
"(T|t)he(?!\sfat)" => The fat cat sat on the mat.
4.3 ?<= ...
Just after the assertion
Just after the assertion Write it down as (?<=...)
Used to filter all matching results , The screening criteria are It is preceded by the format defined in the assertion .
for example , expression (?<=(T|t)he\s)(fat|mat)
matching fat
and mat
, And followed by The
or the
.
"(?<=(T|t)he\s)(fat|mat)" => The fat cat sat on the mat.
4.4 ?<!...
Negative post assertion
Negative post assertion Write it down as (?<!...)
Used to filter all matching results , The screening criteria are It is not followed by the format defined in the assertion .
for example , expression (?<!(T|t)he\s)(cat)
matching cat
, And it doesn't follow The
or the
.
"(?<!(T|t)he\s)(cat)" => The cat sat on cat.
5. sign
Flags are also called pattern modifiers , Because it can be used to modify the search results of expressions .
These signs can be used in any combination , It's also part of the whole regular expression .
sign | describe |
---|---|
i | Ignore case . |
g | Global search . |
m | Multiline modifier : Anchor metacharacters ^ $ The scope of work is at the beginning of each line . |
5.1 Ignore case (Case Insensitive)
Modifier i
Used to ignore case .
for example , expression /The/gi
Means to search globally The
, In the rear i
Change its condition to ignore case , It becomes search the
and The
,g
For global search .
"The" => The fat cat sat on the mat.
"/The/gi" => The fat cat sat on the mat.
5.2 Global search (Global search)
Modifier g
Often used to perform a global search match , namely ( Not just the first matching , But back to all ).
for example , expression /.(at)/g
Represent search Any character ( Except for line breaks )+ at
, And return all the results .
"/.(at)/" => The fat cat sat on the mat.
"/.(at)/g" => The fat cat sat on the mat.
5.3 Multiline modifier (Multiline)
Multiline modifier m
Often used to perform a multiline match .
As introduced before (^,$)
Used to check whether the format is at the beginning or end of the string to be detected . But if we want it to work at the beginning and end of each line , We need to use multiline modifiers m
.
for example , expression /at(.)?$/gm
For lowercase characters a
Followed by lowercase characters t
, At the end, you can choose any character except line break . according to m
Modifier , Now the expression matches the end of each line .
"/.at(.)?$/" => The fat cat sat on the mat.
"/.at(.)?$/gm" => The fat cat sat on the mat.
6. Greedy match and inert match (Greedy vs lazy matching)
By default, regular expressions use greedy matching pattern , In this mode it means matching as many substrings as possible . We can use ?
Convert greedy matching pattern to lazy matching pattern .
"/(.*at)/" => The fat cat sat on the mat.
"/(.*?at)/" => The fat cat sat on the mat.
contribution
- Report a problem
- Open merge request
- Spread this document
- Contact me directly [email protected]
license
MIT Zeeshan Ahmad
边栏推荐
- 【Unity】打包WebGL项目遇到的问题及解决记录
- ROS learning -5 how function packs with the same name work (workspace coverage)
- Understand speech denoising
- Mac下搭建MySQL环境
- 1000 fans ~
- [keras] generator for 3D u-net source code analysis py
- Decoding iFLYTEK open platform 2.0 is a fertile land for developers and a source of industrial innovation
- [the second day of the actual combat of the smart lock project based on stm32f401ret6 in 10 days] light up with the key ----- input and output of GPIO
- Basic exercise of test questions Yanghui triangle (two-dimensional array and shallow copy)
- STM32 timer interrupt learning notes
猜你喜欢
STM32 sensorless brushless motor drive
Chapter7-12_ Controllable Chatbot
如何解决通过new Date()获取时间写出数据库与当前时间相差8小时问题【亲测有效】
Vivo released originos ocean, and the domestic customized system is getting better and better
传感器:SHT30温湿度传感器检测环境温湿度实验(底部附代码)
Huawei equipment is configured with IP and virtual private network hybrid FRR
C language conditional compilation routine
ROS learning-7 error in custom message or service reference header file
Gadgets: color based video and image cutting
Classification and summary of system registers in aarch64 architecture of armv8/arnv9
随机推荐
记录:如何解决MultipartFile类的transferTo()上传图片报“系统找不到指定的路径“问题【亲测有效】
The scientific innovation board successfully held the meeting, and the IPO of Kuangshi technology ushered in the dawn
[keras] train py
0- blog notes guide directory (all)
Chapter7-11_ Deep Learning for Question Answering (2/2)
Leetcode daily question - 890 Find and replace mode
Chapter7-10_ Deep Learning for Question Answering (1/2)
【Unity】打包WebGL项目遇到的问题及解决记录
Paper reading - joint beat and downbeat tracking with recurrent neural networks
STM32 IIC protocol controls pca9685 steering gear drive board
Yovo3 and yovo3 tiny structure diagram
STM32 external interrupt Usage Summary
Cumulative tax law: calculate how much tax you have paid in a year
STM32 steering gear controller
[work with notes] MFC solves the problem that pressing ESC and enter will automatically exit
Application and routine of C language typedef struct
Armv8-m (Cortex-M) TrustZone summary and introduction
Gome's ambition of "folding up" app
Think about the possibility of attacking secure memory through mmu/tlb/cache
华为设备配置私网IP路由FRR