当前位置:网站首页>[regular expression series] greedy and non greedy patterns
[regular expression series] greedy and non greedy patterns
2022-06-30 05:56:00 【bckBCK】
【 Regular expression series 】 Greedy and non greedy patterns
Preface
This paper belongs to One of the regular expression series , For more, please go to Regular expression series
Greedy patterns and non greedy patterns are important characteristics of regular matching In understanding the difference between greed and non greed , According to the example , Step by step
The outline
- Introduction to matching rules
- A quick understanding of greedy and non greedy patterns
- Practice with examples
- Backtracking phenomenon and matching failure
Introduction to matching rules
var str='aabcab';
var reg=/ab/;
var res=str.match(reg);
// ab index by 1
console.log(res);
To quickly understand regular matching rules , Try to understand the above example first
The matching step is like this :
- initial
index=0
, Match to charactera
- Next, match the next character
a
, But because ofaa
and/ab/
Mismatch , So this match failed index
Move to the next , from1
Start , It's a rematcha
- Next, match the next character
b
, Just in time with/ab/
matching , Therefore, the match was successful , Back toab
,index=1
- If the regular match is followed by
g
This keyword , Will continue to start the next group of matching ( But in this case there is nog
, So there is only one set of results )
The main points of
- The first match has the highest priority
A detailed explanation of this point is : For example, the first matching character is a
, Assume that there is no matching failure in the subsequent matching . Then it will always match , Until the match is complete , in other words index=0
It won't change , Until the match is complete ( If a match fails and cannot be traced ,index
Will continue to move down )
This applies to the following greedy and non greedy modes ( And the priority is higher than them ), So remember
A quick understanding of greedy and non greedy patterns
Greedy match pattern
Definition
Regular expressions to match , Will match as many qualified content as possible
identifier
+`,`?`,`*`,`{n}`,`{n,}`,`{n,m}
When the match , If the above identifier is encountered , Represents greedy matching , Will match as much content as possible
Example
var str='aacbacbc';
var reg=/a.*b/;
var res=str.match(reg);
// aacbacb index by 0
console.log(res);
In the above example , Match to first a
after , Begin to match .*
, Because it's the greedy model , It will always match back , Until the last one that meets the conditions b
until , So the matching result is aacbacb
Example 2
var str='aacbacbc';
var reg=/ac.*b/;
var res=str.match(reg);
// acbacb index by 1
console.log(res);
The first match is a
, Then match the next character a
when , And regular do not match , So the match failed ,index
Move to 1
, Then the match was successful ac
, Continue to match , Because it's the greedy model , Match as many results as possible , Until the last one that meets the requirements b
until , So the matching result is acbacb
Non greedy matching pattern
Definition
Regular expressions to match , Will try to match as few qualified content as possible in other words , Once a match is found to meet the requirements , Match immediately , Instead of continuing to match ( Unless there is g
, Open the next set of matching )
identifier
+?`,`??`,`*?`,`{n}?`,`{n,}?`,`{n,m}?
You can see , The identifiers of non greedy patterns are very regular , Is to add a after the identifier of greedy mode ?
Example
var str='aacbacbc';
var reg=/a.*?b/;
var res=str.match(reg);
// aacb index by 0
console.log(res);
In the above example , Match to first a
after , Begin to match .*?
, Because of the non greedy model , It matches the first one b
after , It's a match , So the matching result is aacb
Why aacb
instead of acb
Well ? Because I mentioned a priority rule that is being matched : The first match has the highest priority first a
It's a match , As long as there is no matching failure , It will always match , Until the match is successful
Example 2
var str='aacbacbc';
var reg=/ac.*?b/;
var res=str.match(reg);
// acb index by 1
console.log(res);
Match first a
, Next, match the second a
when , The match failed index
Turn into 1
, Continue matching ac
success , Continue matching b
, Because of the non greedy model , here acb
The minimum requirements for regularity have been met , So the match was successful , The result is acb
Example 3
var str='aacbacbc';
var reg=/a.*?/;
var res=str.match(reg);
// a index by 0
console.log(res);
var reg2=/a.*/;
var res2=str.match(reg2);
// aacbacbc index by 0
console.log(res2);
This example is an example of 1 A supplement to , You can find , When there is no b
when , Because of the non greedy model , Match to first a
The direct matching is successful The latter greedy pattern will match all
Practice with examples
After a preliminary understanding of greedy mode and non greedy mode , You can deepen your understanding through practice
extract HTML
Medium Div
label
Give a HTML
character string , as follows
Other elements
<div><span> user :<span/><span> Zhang San <span/></div>
<div><span> password :<span/><span>123456<span/></div>
Other elements
demand : Extract div
The contents of the package ( Include div
The label itself ), And store each result in an array
Code : This is accomplished by global matching of non greedy patterns , as follows
var reg=/<div>.*?<\/div>/g;
var res=str.match(reg);
// ["<div><span> user :<span/><span> Zhang San <span/></div>", "<div><span> password :<span/><span>123456<span/></div>"]
console.log(res);
Detailed explanation : Two knowledge points are used ,.*?
Non greedy pattern matching and g
The global matching
- .*?<\/div> Means only one match at a time div, This ensures that every div Will not cross the border
- final
g
Represents global matching , That is, after the first match is successful , Will put the matching results into the array , Then from the nextindex
Start matching new results again
in addition : Suppose you use /<div>.*<\/div>/g
Match greedy patterns , The result is
["<div><span> user :<span/><span> Zhang San <span/></div><div><span> password :<span/><span>123456<span/></div>"]
Because the greedy pattern matches the first one <div>
Then it will match the following characters greedily , Until the last one who meets the conditions </div>
until , This leads to the transfer of all div
The labels are all matched together
Take two ""
The substring in , It can no longer contain ""
The example is quoted from : Regular expressions Detailed explanation of greedy and non greedy patterns
"The phrase \"regular expression\" is called \"Regex\" for short"
demand : Extract substring between two quotation marks , You can no longer include quotation marks , For example, the above extraction result should be : "regular expression"
And "Regex"
( Every end of "
Followed by a space )
Wrong solution : Through the following non greedy matching ( Please note the space )
var str='"The phrase \"regular expression\" is called \"Regex\" for short"';
var reg=/".*?" /g;
var res=str.match(reg);
// ['"The phrase "regular expression" ', '"Regex" ']
console.log(res);
You can see , The above match is exactly the wrong match , This regular matches to the first one that meets the condition "+ Space
Then it stops automatically
The correct solution : Use greedy patterns for matching
var reg=/"[^"]*" /g;
var res=str.match(reg);
// ['"regular expression" ', '"Regex" ']
console.log(res);
In this match
- From the first
"
Begin to match , Next to the12
When a ("r
Of"
), dissatisfaction[^"]
, Not satisfied with the future"+ Space
, So the match failed ,index
Move to the next , Start the next match - The second match is from
"r
Of"
Start , All the way ton" Space
OfSpace
, This group just matched successfully ( Because it finally conforms to the regular" Space
), It's a good match"regular expression" Space
- The third match matches
"Regex" Space
( The process will not be repeated ) - To the end , There's only one left
"
Direct matching failed ( Because first of all, it must conform to"
Can begin to struggle to match ) - thus , Regular matching ends , The match is successful , And in line with expectations
Last : This example is relatively complicated , For better understanding , You can refer to the article in the quotation source , There is an introduction to the principle in addition , There is also a failure to match non greedy patterns in the reference article , Backtrack the characteristics that affect performance, etc. to analyze and explain the principle
Backtracking phenomenon and matching failure
Have you really understood the greedy model and the non greedy model ?
Backtracking phenomenon
I don't know about the last example mentioned above to flash back
Does this word have a concept ? The above example still refers to the example in the source for analysis
Original string
"Regex"
Greedy matching process analysis
".*"
- first
"
Take control , Match regular"
, The match is successful , Control to.*
.*
After taking control , Match the next characters ,.
Represents matching any character ,*
Means matching or not , This is the identifier of greedy mode , Will try to match first , So next from1
In positionR
Begin to match , Successfully matched in turnR
,e
,g
,e
,x
, Then continue to match the last character"
, The match is successful , At this point, the match has reached the end of the string , therefore.*
Match over , Give the controller to the last in the regular expression"
"
After taking control , Because it has reached the end of the string , So the match failed , Look forward to the status available for backtracking , Control to.*
,.*
Give up a character"
, And give control to"
, At this point, the matching is successful- thus , The entire regular expression is matched , The matching result is
"Regex"
, The matching process is traced back to1
Time
Non greedy matching expression
".*?"
- first
"
Take control , Match regular"
, The match is successful , Control to.*?
.*?
After taking control , Because this is an identifier in non greedy mode , Therefore, in the case of matching and mismatching, the priority will be mismatching , So try not to match anything , Hand over control"
, hereindex
stay1
It's about (R
At the end of the character )"
After taking control , Begin to match1
SituatedR
, Matching failure , Look forward to the status available for backtracking , Control to.*?
,.*?
Eat one character ,index
here we are2
It's about , And give control to"
"
After taking control , Begin to match2
Situatede
, Matching failure , Repeat the above backtracking process , until.*?
Eat intox
character , And then give control to”
"
After taking control , Begin to match6
Situated"
, The match is successful- thus , The entire regular expression is matched , The matching result is ”Regex”, The matching process is traced back to
5
Time
Optimize the removal of backtracking
In the above greedy matching , There is a backtracking phenomenon , In fact, you can also prevent backtracking by optimizing expressions , such as
"[^"]*"
In this expression, a sub expression is constructed -[]
Medium ^"
, Its function is to exclude "
matching , such *
The greedy match will not take the initiative to eat "
, In the end, it will be "
matching "
, The match is successful , No backtracking
summary
From the above analysis, it can be seen that , In the case of a successful match , The greedy model has less backtracking ( It can be verified by more experiments ), Therefore, in application , In the case of not being very proficient in regularity , The matching of greedy patterns can be given priority , This can avoid many performance problems
Match the failure case
The above retrospective analysis is based on the success of matching , What if the match fails ?
var str = '"Regex'
var reg = /"[^"]*"/g;
In this original character , No best "
, So matching will fail , Its process is roughly as follows
"
matching"
, next[]
Of^"
And*
matchingR
,e
,g
,e
,x
- Then at the end ,
"
Take control , Because in the end , Start backtracking - The result of successive backtracking is
*
Give upx
,e
,g
,e
,R
, until*
Can no longer yield characters , The first round of matching failed - next
index
Start moving down , In turn"
matchingR
,e
,g
,e
,x
They all failed , There was no match until the end , Therefore, the regular expression matching failed this time , There is no match for the result ( Or returnnull
)
What about the non greedy model ?
/"[^"]*?"/g
"
matching"
, next*
Try not to match ,"
matchingR
, Failure , Then go back ,*
buy inR
- The next step is similar to the previous step ,
*
Go back and eat in turne
,g
,e
,x
, All the way to the end ,*
Look back again when you want to eat , We have reached the end of the string , Unable to continue , So the first round of matching failed - next
index
Start moving down , In turn"
matchingR
,e
,g
,e
,x
They all failed , returnnull
summary
Through the example of matching failure, we can see the difference between greedy and non greedy patterns . Greed is Eat first , Go back and give up
, Non greed is Ignore first , Go back and eat
and , In case of a match failure , Greedy mode also has a lot of backtracking ( Of course, non greed has always been a lot of backtracking )
however , In practice, it can be optimized by sub expression , For example, build ^xxx
, The matching can fail in advance when the matching fails to meet the conditions , So there will be less backtracking
var str = '"cccccc'
var reg = /"[^"c]*"/g;
This directly excludes c
, therefore *
I won't eat it , The match failed directly , A lot of backtracking is reduced ( Of course , The above is just the simplest example , The actual situation is more complicated )
Write it at the end
Regular matching , Greedy mode and non greedy mode can be seen at first glance , It's easy to understand , But a real in-depth understanding requires mastering the principles of regularity , also , When you really understand them , It's not just about writing regular expressions , It's a high-performance regular expression , For example, it is easier to write high-performance expressions after understanding the backtracking feature in non greedy patterns
边栏推荐
- On line assignment of financial cost management in the 22nd spring of Western Polytechnic University [Full Score answer]
- Database SQL language 05 SQL exercise
- Stack overflow caused by C # using protobuf stack overflow
- MySQL数据库用户管理
- Database SQL language 06 single line function
- luoguP2756 飞行员配对方案问题(最大流)
- [GPU] basic operation
- Inno setup the simplest user-defined interface effect
- 如何制作CSR(Certificate Signing Request)文件?
- Xi'an Jiaotong 21st autumn "computerized accounting" online homework answer sheet (I) [standard answer]
猜你喜欢
Sword finger offer 18 Delete the node of the linked list
Solidity - 安全 - 重入攻击(Reentrancy)
Dynamic programming -- gliding wing of the strange thief Kidd
Cisco VXLAN配置
English grammar_ Adjective / adverb Level 3 - superlative
Do you know how to show the health code in only 2 steps
Shenzhou ares tx6 boot logo modification tutorial
STM32F103 series controlled OLED IIC 4-pin
C语言基础小操作
[chestnut sugar GIS] global mapper - how to assign the elevation value of the grid to the point
随机推荐
Xi'an Jiaotong 21st autumn online expansion resources of online trade and marketing (III) [standard answer]
Switch to software testing and report to the training class for 3 months. It's a high paying job. Is it reliable?
MySQL事物
二十四、输入输出设备模型(串口/键盘/磁盘/打印机/总线/中断控制器/DMA和GPU)
[exercise] basic practice letter graph of Blue Bridge Cup
Xijiao 21 autumn "motor and drive" online homework answer sheet (I) [standard answer]
2022年,谁在推动音视频产业的新拐点?
Attempt to redefine 'timeout' at line 2 solution
MySQL存储系统
Cisco VXLAN配置
Who is promoting the new inflection point of audio and video industry in 2022?
[Blue Bridge Road -- bug free code] DS1302 time module code analysis
Ultra simple STM32 RTC alarm clock configuration
The definition of strain was originally from stretch_ Ratio started
抓取手机端变体组合思路设想
Delete the repeating elements in the sorting list (simple questions)
At the age of 32, I fell into a middle-aged crisis and finally quit naked...
MySQL advanced (Advanced SQL statement)
Mysql database user management
09- [istio] istio service entry