当前位置:网站首页>Introduction and underlying analysis of regular expressions
Introduction and underlying analysis of regular expressions
2022-07-06 06:36:00 【Yaya yaya】
Regular expressions
Regular expression Introduction
Also known as regular expression ( English :Regular Expression, In code it is often abbreviated as regex、regexp or RE), Computer science A concept of . Regular expressions are often used for retrieval 、 Replace those that match a pattern ( The rules ) The text of .
Many programming languages support string manipulation with regular expressions . for example , stay Perl A powerful regular expression engine is built in . The concept of regular expression was originally developed by Unix Tool software in ( for example sed and grep) Popular . Regular expressions are usually abbreviated to “regex”, singular Yes regexp、regex, The plural Yes regexps、regexes、regexen.
Regular expressions are easy to use
Suppose you use a crawler to crawl this content on the website , How to quickly get the English letters or numbers in this paragraph ? Regular expressions are used at this time
Python from Netherlands mathematics and Computer science Research Society Guido · Van rosum On 1990 It was designed in the early 's , As a course called ABC Language substitute . [1] Python Provides an efficient advanced data structure , It can also simply and effectively object-oriented Programming .Python Syntax and dynamic types , as well as Explanatory language The essence of , Make it written on most platforms Script And a programming language for rapid application development , [2] With the continuous update of the version and the addition of new language features , Gradually used for independent 、 A large project Development of . [3] Python Interpreter extensible , have access to C Language or C++( Or something else can be done through C Calling language ) Expand new features and data type . [4] Python It can also be used as an extensible programming language in customizable software .Python Rich library of standards , It provides a platform for each main system platform Source code or Machine code . [4] 2021 year 10 month , Compiler for language popularity index Tiobe take Python Crowned the most popular programming language ,20 Put it in... For the first time in years Java、C and JavaScript above .
package regularexpression;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression {
public static void main(String[] args) {
// Suppose you use a crawler to crawl to this content
String context = "Python By Guido of the Dutch Society for mathematical and computer science research · Van rosum On 1990 It was designed in the early 's ," +
" As a course called ABC A substitute for language . [1] Python Provides efficient advanced data structure , It can also simply and effectively ground " +
" Programming objects .Python Syntax and dynamic types , And the nature of interpretative language , Make it easy to write scripts and fast on most platforms " +
" Programming language for developing applications , [2] With the continuous update of the version and the addition of new language features , Gradually used for independent 、 A large project " +
" Development of . [3] "+"Python The interpreter is easy to extend , have access to C Language or C++( Or something else can be done through C Calling language )" +
" Expand new functions and data classes " +
" type . [4] Python It can also be used as an extensible programming language in customizable software .Python Rich library of standards , Provides a " +
" Source code or machine code of three main system platforms . [4] " +
"2021 year 10 month , Compiler for language popularity index Tiobe take Python Crowned the most popular programming language ,20 For the first time in years " +
" On Java、C and JavaScript above . [16] ";
// Extract all English words in the article
// So let's create one Pattern object
Pattern pattern = Pattern.compile("[a-zA-Z]+");
// Create a match object
// Namely matcher The matcher follows pattern Model claims , To context Match in text
// Find and return true Return if not found false
Matcher matcher = pattern.matcher(context);
// Begin to match
while (matcher.find()){
// The matching content will be in matcher.group(0) in
System.out.println(" find " + matcher.group(0));
}
}
}
The running result is
When we need to get the numbers in the text , Just put this statement
Pattern pattern = Pattern.compile("[a-zA-Z]+");
Replace with
Pattern pattern = Pattern.compile("[0-9]+");
The running result is
If you want to capture the year related, you will
Pattern pattern = Pattern.compile("[0-9]+");
Replace with
\ \d
Represents any number
Pattern pattern = Pattern.compile(""\\d\\d\\d\\d"");
The running result is
analysis
martch.find() The task accomplished
- According to rules , Locate the substring that meets the rule
- After finding , Record the index of Zizi very beginning to
matcher
Object propertiesint[] groups
;groups[0] = 31
, Put the word string of lodging Indexes +1 The value of is recorded togroups[1] = 35
Perform breakpoint debugging here , It turns out that there is one groups
When we took a step forward, we found groups Array groups[0] = 31
groups[1] = 35
Is precisely 1990 Index position in text .
- It will also record
oldLast
Value , Ending with a substring Indexes +1 The value is 35 , That is, the next execution find When , From 35 This position starts to match .
march.group(0) analysis
We go through group
Source code
public String group(int group) {
if (this.first < 0) {
throw new IllegalStateException("No match found");
} else if (group >= 0 && group <= this.groupCount()) {
return this.groups[group * 2] != -1 && this.groups[group * 2 + 1] != -1 ? this.getSubSequence(this.groups[group * 2], this.groups[group * 2 + 1]).toString() : null;
} else {
throw new IndexOutOfBoundsException("No group " + group);
}
}
- according to
groups[0] = 31
andgroups[1] = 35
The location of the record , from content Start to intercept the string and put it back . That is to say, what is returned is [31,35) The position of the index in the interval . - If the execution continues
find
Method Still follow the above rules .
When will
Pattern pattern = Pattern.compile(""\\d\\d\\d\\d"");
Change to grouping , The first bracket is the first group , The second bracket is the second group .
Pattern pattern = Pattern.compile("(\\d\\d)(\\d\\d)");
According to rules , Locate the substring that meets the rule
After finding , Record the index of Zizi very beginning to matcher Object properties int[] groups; groups[0] = 31 , Put the Indexes +1 The value of is recorded to
groups[1] = 35
.Record the first group () Matching string
groups[2] = 31
groups[3] = 33
Because the matching string is 1980 therefore 19 Is the location of the first group , Again because 1 The index is 31,9 The index is 32 therefore groups[2] = 31 groups[3] = 33
Record the second group () Matching string
groups[3] = 33
groups[4] = 35
If there are more groups, and so on
Pictured
take while
Change the code in to
while (matcher.find()){
// The matching content will be in matcher.group(0) in
System.out.println(" find " + matcher.group(0));
System.out.println(" The first group () The match is " + matcher.group(1));
System.out.println(" The second group () The match is " + matcher.group(2));
}
The running result is
Summary : If the regular expression has () That is, grouping , The extracted string rule is group(0)
Represents the matched string ,group(1)
Represents the first set of characters in the matched substring ,group(2)
Represents the second set of characters in the matched substring . Be careful !! The number of groups cannot exceed the limit .
边栏推荐
- Basic knowledge of MySQL
- Simulation volume leetcode [general] 1219 Golden Miner
- Qt:无法定位程序输入点XXXXX于动态链接库。
- 英语论文翻译成中文字数变化
- Simulation volume leetcode [general] 1447 Simplest fraction
- Py06 字典 映射 字典嵌套 键不存在测试 键排序
- ECS accessKey key disclosure and utilization
- org. activiti. bpmn. exceptions. XMLException: cvc-complex-type. 2.4. a: Invalid content beginning with element 'outgoing' was found
- 端午节快乐Wish Dragon Boat Festival is happy
- CS passed (cdn+ certificate) PowerShell online detailed version
猜你喜欢
How to do a good job in financial literature translation?
It is necessary to understand these characteristics in translating subtitles of film and television dramas
Advanced MySQL: Basics (1-4 Lectures)
What are the commonly used English words and sentences about COVID-19?
Classification des verbes reconstruits grammaticalement - - English Rabbit Learning notes (2)
Oscp raven2 target penetration process
基于JEECG-BOOT的list页面的地址栏参数传递
红蓝对抗之流量加密(Openssl加密传输、MSF流量加密、CS修改profile进行流量加密)
org. activiti. bpmn. exceptions. XMLException: cvc-complex-type. 2.4. a: Invalid content beginning with element 'outgoing' was found
org.activiti.bpmn.exceptions.XMLException: cvc-complex-type.2.4.a: 发现了以元素 ‘outgoing‘ 开头的无效内容
随机推荐
[ 英语 ] 语法重塑 之 动词分类 —— 英语兔学习笔记(2)
How to do a good job in financial literature translation?
On the first day of clock in, click to open a surprise, and the switch statement is explained in detail
LeetCode 732. My schedule III
LeetCode每日一题(1870. Minimum Speed to Arrive on Time)
A 27-year-old without a diploma, wants to work hard on self-study programming, and has the opportunity to become a programmer?
论文翻译英译中,怎样做翻译效果好?
记一个基于JEECG-BOOT的比较复杂的增删改功能的实现
我的创作纪念日
LeetCode每日一题(1997. First Day Where You Have Been in All the Rooms)
Black cat takes you to learn EMMC Protocol Part 10: EMMC read and write operation details (read & write)
国际经贸合同翻译 中译英怎样效果好
红蓝对抗之流量加密(Openssl加密传输、MSF流量加密、CS修改profile进行流量加密)
Cobalt strike feature modification
Remember the implementation of a relatively complex addition, deletion and modification function based on jeecg-boot
LeetCode 731. My schedule II
Defense (greed), FBI tree (binary tree)
Traffic encryption of red blue confrontation (OpenSSL encrypted transmission, MSF traffic encryption, CS modifying profile for traffic encryption)
Day 245/300 JS forEach 多层嵌套后数据无法更新到对象中
[Tera term] black cat takes you to learn TTL script -- serial port automation skill in embedded development