当前位置：网站首页>Regular expression (4)

Regular expression (4)

2022-07-28 15:37:00 【WHJ226】

Catalog

1. regular expression syntax

Regular expressions （regular expression, abbreviation re）, Also known as regular expression , It is often used to retrieve and replace text that meets certain rules .

1. regular expression syntax

1.1 Line locators

Line locators
character	explain
^	Matches the beginning of the string
$	Match the end of the string

1.2 Metacharacters

Metacharacters
Metacharacters	explain
.	Matches any character other than a newline character
\w	Match letters or numbers or underscores or Chinese characters
\s	Match any whitespace
\d	Match the Numbers
\b	Match the beginning or end of a word
\n	Match a line break
\t	Match a tab
\W	Match non alphabetic or non numeric or non underlined or non Chinese characters
\D	Match non numeric
\S	Match non whitespace

1.3 qualifiers

qualifiers
qualifiers	explain
?	Match zero or one time
+	Match once or more
*	Match zero or more times
{n}	matching n Time
{n,}	matching n Times or times
{n,m}	matching n Time to m Time

1.4 Other characters

#[...] Match characters in a character set , for example [abcde][123456][a-zA-Z][0-9]
#[^...] Matches all characters except those in the character group 
#a|b Matching character a Or character b
#r or R Native characters , Add... Before the pattern string r or R Become a native character 
#.*   Greedy matching （ Match as many times as possible ）
#.*?  Inertia matching （ Match as few times as possible ）

2. Match string

2.1 match()

match() Method is used to match from the beginning of a string .

The syntax is as follows ：

re.match(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example ：re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string

for example ：

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.match(pattern,string1,re.I) # Match string , There is no need to divide letters into upper and lower case 
match2 = re.match(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows ：

<re.Match object; span=(0, 10), match='MY__SCHOOL'>
None

span=(0, 10) Indicates the matching position ,0 To the first 9 Characters ,match='MY__SCHOOL' Represents matching data ; The return value is None, Because match() Method to match from the beginning of the string , When the first letter does not meet the conditions , Will no longer match , Go straight back to None.

match() Other uses of the method are as follows ：

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string = 'MY__SCHOOL my_school' # String to match 1
match = re.match(pattern,string,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(' Output matching results :',match) # Output matching results 
print(' Match the start and end positions :',match.start())
print(' Match end position :',match.end())
print(' Tuples matching positions :',match.span())
print(' String to match :',match.string)
print(' Matched data :',match.group())

The operation results are as follows ：

 Output matching results : <re.Match object; span=(0, 10), match='MY__SCHOOL'>
 Match the start and end positions : 0
 Match end position : 10
 Tuples matching positions : (0, 10)
 String to match : MY__SCHOOL my_school
 Matched data : MY__SCHOOL

2.2 search()

search() Method is used to search the entire string for the value of the pattern string that appears for the first time . If the matching string contains the matching object , Then the match is successful , Return match object , Otherwise return to None.

The syntax is as follows ：

re.search(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example ：re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string

for example ：

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.search(pattern,string1,re.I) # Match string , There is no need to divide letters into upper and lower case 
match2 = re.search(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows ：

<re.Match object; span=(0, 10), match='MY__SCHOOL'>
<re.Match object; span=(2, 12), match='MY__SCHOOL'>

search() Other uses of the method are as follows ：

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string = 'MY__SCHOOL my_school' # String to match 1
match = re.search(pattern,string,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(' Output matching results :',match) # Output matching results 
print(' Match the start and end positions :',match.start())
print(' Match end position :',match.end())
print(' Tuples matching positions :',match.span())
print(' String to match :',match.string)
print(' Matched data :',match.group())

The operation results are as follows ：

 Output matching results : <re.Match object; span=(0, 10), match='MY__SCHOOL'>
 Match the start and end positions : 0
 Match end position : 10
 Tuples matching positions : (0, 10)
 String to match : MY__SCHOOL my_school
 Matched data : MY__SCHOOL

2.3 findall()

findall() Method is used to search the entire string for all strings that match the pattern string , And return... As a list .

The syntax is as follows ：

re.findall(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example ：re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string

for example ：

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.findall(pattern,string1) # Match string , It needs to be divided into uppercase and lowercase letters 
match2 = re.findall(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows ：

['my_school']
['MY__SCHOOL', 'my_school']

2.4 sub()

sub() Method is used to replace a string .

The syntax is as follows ：

re.sub(pattern,repl,string,count,flags)
#pattern: Pattern string 
#repl: Replace string 
#string: The string to be found and replaced 
#count: Optional parameters , Number of replacements , Replace all... By default 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode

for example ：

import re
pattern1 = r'__'
pattern2 = r'oo'
string1 = 'MY__SCHOOL my__school' # String to match 1
string2 = ' School MY__SCHOOL my__school' # String to match 2
result1 = re.sub(pattern1,'**',string1) # take '__' Replace all with **
result2 = re.sub(pattern1,'**',string1,1) # take '__' Replace with **, Replace... Once 
result3 = re.sub(pattern2,'**',string2) # take 'oo' Replace all with **
print(result1)
print(result2)
print(result3)

The operation results are as follows ：

MY**SCHOOL my**school
MY**SCHOOL my__school
 School MY__SCHOOL my__sch**l

2.5 replace()

replace() Method is also used to implement string replacement .

The syntax is as follows ：

string.replace(pattern,repl,count)
#string: The string to be found and replaced 
#pattern: Pattern string , That is, the string that needs to be replaced 
#repl: Replace with a string of 
#count: Optional parameters , Number of replacements , Replace all... By default

for example ：

import re
pattern1 = r'__'
pattern2 = r'oo'
string1 = 'MY__SCHOOL my__school' # String to match 1
string2 = ' School MY__SCHOOL my__school' # String to match 2
result1 = string1.replace(pattern1,'**') # take '__' Replace all with **
result2 = string1.replace(pattern1,'**',1) # take '__' Replace with **, Replace... Once 
result3 = string2.replace(pattern2,'**') # take 'oo' Replace all with **
print(result1)
print(result2)
print(result3)

The operation results are as follows ：

MY**SCHOOL my**school
MY**SCHOOL my__school
 School MY__SCHOOL my__sch**l

3. Split string

split() Method is used to split strings according to regular expressions , And return... As a list .

The syntax is as follows ：

re.split(pattern,string,[maxsplit],flags)
#pattern: Pattern string 
#string: String to match 
#maxsplit: Optional parameters , Maximum number of splits 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode

for example ：

import re # The import module 
pattern1 = '[?]' # Define separator 
pattern2 = '[@]' # Define separator 
pattern3 = r'[?|@]' # Define separator 
string1 = 'MY?SCHOOL?my?school' # String to match 1
string2 = ' School @[email protected][email protected]' # String to match 2
match1 = re.split(pattern1,string1) # Delimited string 
match2 = re.split(pattern1,string2) # Delimited string 
match3 = re.split(pattern2,string1) # Delimited string 
match4 = re.split(pattern2,string2) # Delimited string 
match5 = re.split(pattern3,string1) # Delimited string 
match6 = re.split(pattern3,string2) # Delimited string 
print(match1)
print(match2)
print(match3)
print(match4)
print(match5)
print(match6)

The operation results are as follows ：

['MY', 'SCHOOL', 'my', 'school']
[' School @[email protected]', 'my', '@school']
['MY?SCHOOL?my?school']
[' School ', 'MY', 'SCHOOL?my?', 'school']
['MY', 'SCHOOL', 'my', 'school']
[' School ', 'MY', 'SCHOOL', 'my', '', 'school']

原网站

版权声明
本文为[WHJ226]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281432476459.html