当前位置:网站首页>[regular expression series] greedy and non greedy patterns
[regular expression series] greedy and non greedy patterns
2022-06-30 05:56:00 【bckBCK】
【 Regular expression series 】 Greedy and non greedy patterns
Preface
This paper belongs to One of the regular expression series , For more, please go to Regular expression series
Greedy patterns and non greedy patterns are important characteristics of regular matching In understanding the difference between greed and non greed , According to the example , Step by step
The outline
- Introduction to matching rules
- A quick understanding of greedy and non greedy patterns
- Practice with examples
- Backtracking phenomenon and matching failure
Introduction to matching rules
var str='aabcab';
var reg=/ab/;
var res=str.match(reg);
// ab index by 1
console.log(res);
To quickly understand regular matching rules , Try to understand the above example first
The matching step is like this :
- initial
index=0, Match to charactera - Next, match the next character
a, But because ofaaand/ab/Mismatch , So this match failed indexMove to the next , from1Start , It's a rematcha- Next, match the next character
b, Just in time with/ab/matching , Therefore, the match was successful , Back toab,index=1 - If the regular match is followed by
gThis keyword , Will continue to start the next group of matching ( But in this case there is nog, So there is only one set of results )
The main points of
- The first match has the highest priority
A detailed explanation of this point is : For example, the first matching character is a, Assume that there is no matching failure in the subsequent matching . Then it will always match , Until the match is complete , in other words index=0 It won't change , Until the match is complete ( If a match fails and cannot be traced ,index Will continue to move down )
This applies to the following greedy and non greedy modes ( And the priority is higher than them ), So remember
A quick understanding of greedy and non greedy patterns
Greedy match pattern
Definition
Regular expressions to match , Will match as many qualified content as possible
identifier
+`,`?`,`*`,`{n}`,`{n,}`,`{n,m}
When the match , If the above identifier is encountered , Represents greedy matching , Will match as much content as possible
Example
var str='aacbacbc';
var reg=/a.*b/;
var res=str.match(reg);
// aacbacb index by 0
console.log(res);
In the above example , Match to first a after , Begin to match .*, Because it's the greedy model , It will always match back , Until the last one that meets the conditions b until , So the matching result is aacbacb
Example 2
var str='aacbacbc';
var reg=/ac.*b/;
var res=str.match(reg);
// acbacb index by 1
console.log(res);
The first match is a, Then match the next character a when , And regular do not match , So the match failed ,index Move to 1, Then the match was successful ac, Continue to match , Because it's the greedy model , Match as many results as possible , Until the last one that meets the requirements b until , So the matching result is acbacb
Non greedy matching pattern
Definition
Regular expressions to match , Will try to match as few qualified content as possible in other words , Once a match is found to meet the requirements , Match immediately , Instead of continuing to match ( Unless there is g, Open the next set of matching )
identifier
+?`,`??`,`*?`,`{n}?`,`{n,}?`,`{n,m}?
You can see , The identifiers of non greedy patterns are very regular , Is to add a after the identifier of greedy mode ?
Example
var str='aacbacbc';
var reg=/a.*?b/;
var res=str.match(reg);
// aacb index by 0
console.log(res);
In the above example , Match to first a after , Begin to match .*?, Because of the non greedy model , It matches the first one b after , It's a match , So the matching result is aacb
Why aacb instead of acb Well ? Because I mentioned a priority rule that is being matched : The first match has the highest priority first a It's a match , As long as there is no matching failure , It will always match , Until the match is successful
Example 2
var str='aacbacbc';
var reg=/ac.*?b/;
var res=str.match(reg);
// acb index by 1
console.log(res);
Match first a, Next, match the second a when , The match failed index Turn into 1, Continue matching ac success , Continue matching b, Because of the non greedy model , here acb The minimum requirements for regularity have been met , So the match was successful , The result is acb
Example 3
var str='aacbacbc';
var reg=/a.*?/;
var res=str.match(reg);
// a index by 0
console.log(res);
var reg2=/a.*/;
var res2=str.match(reg2);
// aacbacbc index by 0
console.log(res2);
This example is an example of 1 A supplement to , You can find , When there is no b when , Because of the non greedy model , Match to first a The direct matching is successful The latter greedy pattern will match all
Practice with examples
After a preliminary understanding of greedy mode and non greedy mode , You can deepen your understanding through practice
extract HTML Medium Div label
Give a HTML character string , as follows
Other elements
<div><span> user :<span/><span> Zhang San <span/></div>
<div><span> password :<span/><span>123456<span/></div>
Other elements
demand : Extract div The contents of the package ( Include div The label itself ), And store each result in an array
Code : This is accomplished by global matching of non greedy patterns , as follows
var reg=/<div>.*?<\/div>/g;
var res=str.match(reg);
// ["<div><span> user :<span/><span> Zhang San <span/></div>", "<div><span> password :<span/><span>123456<span/></div>"]
console.log(res);
Detailed explanation : Two knowledge points are used ,.*? Non greedy pattern matching and g The global matching
- .*?<\/div> Means only one match at a time div, This ensures that every div Will not cross the border
- final
gRepresents global matching , That is, after the first match is successful , Will put the matching results into the array , Then from the nextindexStart matching new results again
in addition : Suppose you use /<div>.*<\/div>/g Match greedy patterns , The result is
["<div><span> user :<span/><span> Zhang San <span/></div><div><span> password :<span/><span>123456<span/></div>"]
Because the greedy pattern matches the first one <div> Then it will match the following characters greedily , Until the last one who meets the conditions </div> until , This leads to the transfer of all div The labels are all matched together
Take two "" The substring in , It can no longer contain ""
The example is quoted from : Regular expressions Detailed explanation of greedy and non greedy patterns
"The phrase \"regular expression\" is called \"Regex\" for short"
demand : Extract substring between two quotation marks , You can no longer include quotation marks , For example, the above extraction result should be : "regular expression" And "Regex"( Every end of " Followed by a space )
Wrong solution : Through the following non greedy matching ( Please note the space )
var str='"The phrase \"regular expression\" is called \"Regex\" for short"';
var reg=/".*?" /g;
var res=str.match(reg);
// ['"The phrase "regular expression" ', '"Regex" ']
console.log(res);
You can see , The above match is exactly the wrong match , This regular matches to the first one that meets the condition "+ Space Then it stops automatically
The correct solution : Use greedy patterns for matching
var reg=/"[^"]*" /g;
var res=str.match(reg);
// ['"regular expression" ', '"Regex" ']
console.log(res);
In this match
- From the first
"Begin to match , Next to the12When a ("rOf"), dissatisfaction[^"], Not satisfied with the future"+ Space, So the match failed ,indexMove to the next , Start the next match - The second match is from
"rOf"Start , All the way ton" SpaceOfSpace, This group just matched successfully ( Because it finally conforms to the regular" Space), It's a good match"regular expression" Space - The third match matches
"Regex" Space( The process will not be repeated ) - To the end , There's only one left
"Direct matching failed ( Because first of all, it must conform to"Can begin to struggle to match ) - thus , Regular matching ends , The match is successful , And in line with expectations
Last : This example is relatively complicated , For better understanding , You can refer to the article in the quotation source , There is an introduction to the principle in addition , There is also a failure to match non greedy patterns in the reference article , Backtrack the characteristics that affect performance, etc. to analyze and explain the principle
Backtracking phenomenon and matching failure
Have you really understood the greedy model and the non greedy model ?
Backtracking phenomenon
I don't know about the last example mentioned above to flash back Does this word have a concept ? The above example still refers to the example in the source for analysis
Original string
"Regex"
Greedy matching process analysis
".*"
- first
"Take control , Match regular", The match is successful , Control to.* .*After taking control , Match the next characters ,.Represents matching any character ,*Means matching or not , This is the identifier of greedy mode , Will try to match first , So next from1In positionRBegin to match , Successfully matched in turnR,e,g,e,x, Then continue to match the last character", The match is successful , At this point, the match has reached the end of the string , therefore.*Match over , Give the controller to the last in the regular expression""After taking control , Because it has reached the end of the string , So the match failed , Look forward to the status available for backtracking , Control to.*,.*Give up a character", And give control to", At this point, the matching is successful- thus , The entire regular expression is matched , The matching result is
"Regex", The matching process is traced back to1Time
Non greedy matching expression
".*?"
- first
"Take control , Match regular", The match is successful , Control to.*? .*?After taking control , Because this is an identifier in non greedy mode , Therefore, in the case of matching and mismatching, the priority will be mismatching , So try not to match anything , Hand over control", hereindexstay1It's about (RAt the end of the character )"After taking control , Begin to match1SituatedR, Matching failure , Look forward to the status available for backtracking , Control to.*?,.*?Eat one character ,indexhere we are2It's about , And give control to""After taking control , Begin to match2Situatede, Matching failure , Repeat the above backtracking process , until.*?Eat intoxcharacter , And then give control to”"After taking control , Begin to match6Situated", The match is successful- thus , The entire regular expression is matched , The matching result is ”Regex”, The matching process is traced back to
5Time
Optimize the removal of backtracking
In the above greedy matching , There is a backtracking phenomenon , In fact, you can also prevent backtracking by optimizing expressions , such as
"[^"]*"
In this expression, a sub expression is constructed -[] Medium ^", Its function is to exclude " matching , such * The greedy match will not take the initiative to eat ", In the end, it will be " matching ", The match is successful , No backtracking
summary
From the above analysis, it can be seen that , In the case of a successful match , The greedy model has less backtracking ( It can be verified by more experiments ), Therefore, in application , In the case of not being very proficient in regularity , The matching of greedy patterns can be given priority , This can avoid many performance problems
Match the failure case
The above retrospective analysis is based on the success of matching , What if the match fails ?
var str = '"Regex'
var reg = /"[^"]*"/g;
In this original character , No best ", So matching will fail , Its process is roughly as follows
"matching", next[]Of^"And*matchingR,e,g,e,x- Then at the end ,
"Take control , Because in the end , Start backtracking - The result of successive backtracking is
*Give upx,e,g,e,R, until*Can no longer yield characters , The first round of matching failed - next
indexStart moving down , In turn"matchingR,e,g,e,xThey all failed , There was no match until the end , Therefore, the regular expression matching failed this time , There is no match for the result ( Or returnnull)
What about the non greedy model ?
/"[^"]*?"/g
"matching", next*Try not to match ,"matchingR, Failure , Then go back ,*buy inR- The next step is similar to the previous step ,
*Go back and eat in turne,g,e,x, All the way to the end ,*Look back again when you want to eat , We have reached the end of the string , Unable to continue , So the first round of matching failed - next
indexStart moving down , In turn"matchingR,e,g,e,xThey all failed , returnnull
summary
Through the example of matching failure, we can see the difference between greedy and non greedy patterns . Greed is Eat first , Go back and give up , Non greed is Ignore first , Go back and eat
and , In case of a match failure , Greedy mode also has a lot of backtracking ( Of course, non greed has always been a lot of backtracking )
however , In practice, it can be optimized by sub expression , For example, build ^xxx, The matching can fail in advance when the matching fails to meet the conditions , So there will be less backtracking
var str = '"cccccc'
var reg = /"[^"c]*"/g;
This directly excludes c, therefore * I won't eat it , The match failed directly , A lot of backtracking is reduced ( Of course , The above is just the simplest example , The actual situation is more complicated )
Write it at the end
Regular matching , Greedy mode and non greedy mode can be seen at first glance , It's easy to understand , But a real in-depth understanding requires mastering the principles of regularity , also , When you really understand them , It's not just about writing regular expressions , It's a high-performance regular expression , For example, it is easier to write high-performance expressions after understanding the backtracking feature in non greedy patterns
边栏推荐
- VLAN access mode
- Sword finger offer 18 Delete the node of the linked list
- On line assignment of financial cost management in the 22nd spring of Western Polytechnic University [Full Score answer]
- How to print pthread_ t - How to print pthread_ t
- The average salary of software testing in 2022 has been released. Have you been averaged?
- Navigate back to fragmentpageradapter - & gt; Fragment is empty - navigating back to fragmentpageradapter - & gt; fragments are empty
- Sword finger offer 22 The penultimate node in the linked list
- UML tools
- 声网,站在物联网的“土壤”里
- Qt之QListView的简单使用(含源码+注释)
猜你喜欢

Today, Ali came out with 35K. It's really sandpaper that wiped my ass. it showed me my hand

Who is promoting the new inflection point of audio and video industry in 2022?

Implementation of property management system with ssm+ wechat applet

Create priority queue

Inno setup the simplest user-defined interface effect

Sword finger offer 22 The penultimate node in the linked list
![[ansible series] fundamentals 02 module debug](/img/99/c53be8e2a42c7cb5b4a9a7ef4ad98c.jpg)
[ansible series] fundamentals 02 module debug

Here comes the nearest chance to Ali
![[MD editing required] welcome to the CSDN markdown editor](/img/6c/df2ebb3e39d1e47b8dd74cfdddbb06.gif)
[MD editing required] welcome to the CSDN markdown editor

Switch to software testing and report to the training class for 3 months. It's a high paying job. Is it reliable?
随机推荐
Answer sheet for online assignment of "motor and drive" of Xijiao 21 autumn (IV) [standard answer]
Promise知识点拾遗
What indicators should safety service engineers pay attention to in emergency response?
Leetcode search insert location
luoguP2756 飞行员配对方案问题(最大流)
MySQL高级SQL语句
STM32F103 series controlled OLED IIC 4-pin
Database SQL language 03 sorting and paging
Promise knowledge points
SSL证书续费相关问题详解
At the beginning of 2022, people who are ready to change jobs should pay attention to
ECS deployment web project
动态规划--怪盗基德的滑翔翼
What do you think of the deleted chat records? How to restore the deleted chat records on wechat?
剑指 Offer 22. 链表中倒数第k个节点
We strongly recommend more than a dozen necessary plug-ins for idea development
[untitled] user defined function
VLAN access mode
Rotating box target detection mmrotate v0.3.1 getting started
Codeforces B. MEX and Array