当前位置:网站首页>Regular expression \b \b understand word boundary matching in simple terms
Regular expression \b \b understand word boundary matching in simple terms
2022-07-24 03:16:00 【Qi Shuo】
List of articles
When using regular expressions , You will encounter situations where you need to accurately match certain characters , Now \b \B That comes in handy . If you don't understand these two regular expression metacharacters , It's not easy to use . This article describes how to use them .
1 \b \B The meaning of representation
| Metacharacters | meaning |
|---|---|
| \b | Matches a word boundary |
| \B | Matches a nonword boundary |
| \w | Match a letter 、 Number or underscore . Equivalent to [A-Za-z0-9_] |
The meaning of these two metacharacters looks very official , But it is very accurate . After understanding, you will find that they are concise and comprehensive , There is no nonsense , But you need to savor it carefully .
To understand these two metacharacters , Two concepts need to be clarified :
- word : Speaking of words , It also involves a metacharacter
\w, As shown in the table above ,\w Represents an alphanumeric or underscore. And words , It is by several \w Composed of , That is, any combination of letters and numbers . - The border : In regular expressions , The boundary is not a specific character , But something that doesn't exist , A concept . Can be said to be \b \B Something that represents , The boundary is the thing between any two characters .
After clarifying these two concepts , The following can explain \b \B Represents the meaning of .
- \b Word boundaries : Match a boundary , One side character is \w, One side character is not \w
- \B Non word boundary : Matching elimination \b Outside the boundary . Both characters are \w, Or both characters are not \w
Here is the next definition , Express in regular expression language :
- Character is a \w:
\w - Characters are not \w:
[^\w]
As shown in the figure below , All imaginary vertical lines in the figure are boundaries .
Above picture , Each boundary satisfies : On one side, \w, On one side, [^\w]. namely :
- On the left is
\w, On the right is[^\w] - On the left is
[^\w], On the right is\w

\B Except \b All boundaries beyond .\b It means that one side is \w, On one side, [^\w] The boundary of the , Then take the opposite , namely :
- On both sides
\w - On both sides
[^\w]
in short :
| Boundary type | The left character | Right hand character |
|---|---|---|
| \b | \w | [^\w] |
| \b | [^\w] | \w |
| \B | \w | \w |
| \B | [^\w] | [^\w] |
The characters used are \w representative 1, Character is a [^\w] representative 0, Then understanding with the logical thinking of computer is :
| Boundary type | The left character | Right hand character |
|---|---|---|
| \b | 1 | 0 |
| \b | 0 | 1 |
| \B | 1 | 1 |
| \B | 0 | 0 |
The representation of two binary numbers is also in these four cases . therefore , The following conclusions can be drawn :
- \b Represents the boundary , The left and right character types are different
- \B Represents the boundary , The left and right character types are the same
Be careful : There are only two types of characters ( yes \w 、 No \w). Everything else can be forgotten , Just remember these two conclusions .
See this , You should be able to understand what word boundaries are from the chart above \b, What are non word boundaries \B 了 .
2 Usage examples
Here according to article 1 The conclusion drawn in section :
- \b Represents the boundary , The left and right character types are different
- \B Represents the boundary , The left and right character types are the same
Analyze with several examples .
2.1 A simple example
2.1.1 Example 1

For mode \bat\b The first of \b, Because its right side is \w, Then its left side must be [^\w];
For mode \bat\b Second of \b, Because its left side is \w, Then its right side must be [^\w];
Therefore, the matching result is shown in the figure .
2.1.2 Example 2

For mode \Bat\b Medium \B, Because its right side is \w, Then the left side must also be \w;
For mode \Bat\b Medium \b, Because its left side is \w, Then its right side must be [^\w];
Therefore, the matching result is shown in the figure .
2.1.3 Example 3

For mode \Bat\B The first of \B, Because its right side is \w, Then the left side must also be \w;
For mode \Bat\B Second of \B, Because its left side is \w, Then its right side must also be \w;
Therefore, the matching result is shown in the figure .
2.1.4 Example 4

For mode \bat\B Medium \b, Because its right side is \w, Then its left side must be [^\w];
For mode \bat\B Medium \B, Because its left side is \w, Then its right side must also be \w;
Therefore, the matching result is shown in the figure .
2.2 Complex examples
If you can understand the simple example above , It is not difficult to understand the following .
2.2.1 Example 5

because % yes [^\w], Therefore, the characters on both sides of the matching result should also be [^\w]

Needless to say, this should be able to understand .
2.2.2 Example 6

According to two \B, Target thank Both sides should be \w

according to \b,r On the left should be [^\w]; according to \B,= On the right should be [^\w]

because r yes \w, according to \b,r On the left should be [^\w];
because = yes [^\w], according to \b,= On the right should be \w;
Two in the string regex== Are not satisfied with the left side is [^\w], The right side is \w, So the match failed .

The second in the string regex== Meet the left side is \w, The right side is \w, So the match was successful .
边栏推荐
猜你喜欢

Nodejs builds cloud native microservice applications based on dapr, a quick start guide from 0 to 1

TCP data transmission and performance

Secondary development of ArcGIS JS API -- loading national sky map

Qt自定义类使用自定义含参信号与槽

FTP service and configuration

在openEuler社区开源的Embedded SIG,来聊聊它的多 OS 混合部署框架

The first edition of Niuke brush question series (automorphic number, return the number of prime numbers less than N, and the first character only appears once)

JS small game running bear and cat source code
![[hdlbits questions] Verilog language (2) vectors](/img/eb/125c9a7781391dc53e37ce347a475d.png)
[hdlbits questions] Verilog language (2) vectors

Industrial controller, do you really know your five insurances and one fund?
随机推荐
How to write selenium's testng.xml
Ugui source code analysis - imaterialmodifier
Hcip day 9 notes (OSPF routing feedback, routing policy, and Configuration Guide)
Cannot resolve symbol 'override' of idea clone‘
TCP connection principle
Hcip day 10 (initial BGP border gateway protocol)
自定义kindeditor富文本默认的宽高
Mobile keyboard (day 73)
uva1445
Ugui source code analysis - maskablegraphic
[MySQL learning] install and use multiple versions of MySQL, MySQL 8 and MySQL 5.7 at the same time, compressed version
[hdlbits questions] Verilog language (2) vectors
Ugui source code analysis - stencilmaterial
String.split()最详细源码解读及注意事项
如何在 pyqt 中实现桌面歌词
数据湖:开源数据湖方案DeltaLake、Hudi、Iceberg对比分析
summernote支持自定义视频上传功能
SolidWorks CAM data cannot be recovered because a processed part has been detected.
Keras deep learning practice (15) -- realize Yolo target detection from scratch
Secondary development of ArcGIS JS API -- loading national sky map