当前位置:网站首页>Regular expression (4)

Regular expression (4)

2022-07-28 15:37:00 WHJ226

Catalog

1. regular expression syntax

1.1 Line locators

1.2 Metacharacters

1.3 qualifiers

1.4 Other characters

2. Match string

2.1 match()

2.2 search()

2.3 findall()

2.4 sub()

2.5 replace()

3. Split string


Regular expressions (regular expression, abbreviation re), Also known as regular expression , It is often used to retrieve and replace text that meets certain rules .

1. regular expression syntax

1.1 Line locators

Line locators
character explain
^ Matches the beginning of the string
$ Match the end of the string

1.2 Metacharacters

Metacharacters
Metacharacters explain
. Matches any character other than a newline character
\w Match letters or numbers or underscores or Chinese characters
\s Match any whitespace
\d Match the Numbers
\b Match the beginning or end of a word
\n Match a line break
\t Match a tab
\W Match non alphabetic or non numeric or non underlined or non Chinese characters
\D Match non numeric
\S Match non whitespace

1.3 qualifiers

qualifiers
qualifiers explain
? Match zero or one time
+ Match once or more
* Match zero or more times
{n} matching n Time
{n,} matching n Times or times
{n,m} matching n Time to m Time

1.4 Other characters

#[...] Match characters in a character set , for example [abcde][123456][a-zA-Z][0-9]
#[^...] Matches all characters except those in the character group 
#a|b Matching character a Or character b
#r or R Native characters , Add... Before the pattern string r or R Become a native character 
#.*   Greedy matching ( Match as many times as possible )
#.*?  Inertia matching ( Match as few times as possible )

2. Match string

2.1 match()

match() Method is used to match from the beginning of a string .

The syntax is as follows :

re.match(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example :re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string 

for example :

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.match(pattern,string1,re.I) # Match string , There is no need to divide letters into upper and lower case 
match2 = re.match(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows :

<re.Match object; span=(0, 10), match='MY__SCHOOL'>
None

span=(0, 10) Indicates the matching position ,0 To the first 9 Characters ,match='MY__SCHOOL' Represents matching data ; The return value is None, Because match() Method to match from the beginning of the string , When the first letter does not meet the conditions , Will no longer match , Go straight back to None.

match() Other uses of the method are as follows :

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string = 'MY__SCHOOL my_school' # String to match 1
match = re.match(pattern,string,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(' Output matching results :',match) # Output matching results 
print(' Match the start and end positions :',match.start())
print(' Match end position :',match.end())
print(' Tuples matching positions :',match.span())
print(' String to match :',match.string)
print(' Matched data :',match.group())

The operation results are as follows :

 Output matching results : <re.Match object; span=(0, 10), match='MY__SCHOOL'>
 Match the start and end positions : 0
 Match end position : 10
 Tuples matching positions : (0, 10)
 String to match : MY__SCHOOL my_school
 Matched data : MY__SCHOOL

2.2 search()

search() Method is used to search the entire string for the value of the pattern string that appears for the first time . If the matching string contains the matching object , Then the match is successful , Return match object , Otherwise return to None.

The syntax is as follows :

re.search(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example :re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string 

for example :

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.search(pattern,string1,re.I) # Match string , There is no need to divide letters into upper and lower case 
match2 = re.search(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows :

<re.Match object; span=(0, 10), match='MY__SCHOOL'>
<re.Match object; span=(2, 12), match='MY__SCHOOL'>

search() Other uses of the method are as follows :

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string = 'MY__SCHOOL my_school' # String to match 1
match = re.search(pattern,string,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(' Output matching results :',match) # Output matching results 
print(' Match the start and end positions :',match.start())
print(' Match end position :',match.end())
print(' Tuples matching positions :',match.span())
print(' String to match :',match.string)
print(' Matched data :',match.group())

The operation results are as follows :

 Output matching results : <re.Match object; span=(0, 10), match='MY__SCHOOL'>
 Match the start and end positions : 0
 Match end position : 10
 Tuples matching positions : (0, 10)
 String to match : MY__SCHOOL my_school
 Matched data : MY__SCHOOL

2.3 findall()

findall() Method is used to search the entire string for all strings that match the pattern string , And return... As a list .

The syntax is as follows :

re.findall(pattern,string,[flags])
#pattern: Pattern string 
#string: String to match 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode , for example :re.S perhaps re.DOTALL Match all characters , Include line breaks 
re.I Matching is not case sensitive ,re.X Ignore spaces and comments that are not escaped in the pattern string 

for example :

import re # The import module 
pattern = r'my_\w+' # Pattern string , Match with my Starting string 
string1 = 'MY__SCHOOL my_school' # String to match 1
string2 = ' School MY__SCHOOL my_school' # String to match 2
match1 = re.findall(pattern,string1) # Match string , It needs to be divided into uppercase and lowercase letters 
match2 = re.findall(pattern,string2,re.I) # Match string , There is no need to divide letters into upper and lower case 
print(match1)
print(match2)

The operation results are as follows :

['my_school']
['MY__SCHOOL', 'my_school']

2.4 sub()

sub() Method is used to replace a string .

The syntax is as follows :

re.sub(pattern,repl,string,count,flags)
#pattern: Pattern string 
#repl: Replace string 
#string: The string to be found and replaced 
#count: Optional parameters , Number of replacements , Replace all... By default 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode 

for example :

import re
pattern1 = r'__'
pattern2 = r'oo'
string1 = 'MY__SCHOOL my__school' # String to match 1
string2 = ' School MY__SCHOOL my__school' # String to match 2
result1 = re.sub(pattern1,'**',string1) # take '__' Replace all with **
result2 = re.sub(pattern1,'**',string1,1) # take '__' Replace with **, Replace... Once 
result3 = re.sub(pattern2,'**',string2) # take 'oo' Replace all with **
print(result1)
print(result2)
print(result3)

The operation results are as follows :

MY**SCHOOL my**school
MY**SCHOOL my__school
 School MY__SCHOOL my__sch**l

2.5 replace()

replace() Method is also used to implement string replacement .

The syntax is as follows :

string.replace(pattern,repl,count)
#string: The string to be found and replaced 
#pattern: Pattern string , That is, the string that needs to be replaced 
#repl: Replace with a string of 
#count: Optional parameters , Number of replacements , Replace all... By default 

for example :

import re
pattern1 = r'__'
pattern2 = r'oo'
string1 = 'MY__SCHOOL my__school' # String to match 1
string2 = ' School MY__SCHOOL my__school' # String to match 2
result1 = string1.replace(pattern1,'**') # take '__' Replace all with **
result2 = string1.replace(pattern1,'**',1) # take '__' Replace with **, Replace... Once 
result3 = string2.replace(pattern2,'**') # take 'oo' Replace all with **
print(result1)
print(result2)
print(result3)

The operation results are as follows :

MY**SCHOOL my**school
MY**SCHOOL my__school
 School MY__SCHOOL my__sch**l

3. Split string

split() Method is used to split strings according to regular expressions , And return... As a list .

The syntax is as follows :

re.split(pattern,string,[maxsplit],flags)
#pattern: Pattern string 
#string: String to match 
#maxsplit: Optional parameters , Maximum number of splits 
#flags: Optional parameters , Indicate flag bit , Used to control the matching mode 

for example :

import re # The import module 
pattern1 = '[?]' # Define separator 
pattern2 = '[@]' # Define separator 
pattern3 = r'[?|@]' # Define separator 
string1 = 'MY?SCHOOL?my?school' # String to match 1
string2 = ' School @[email protected][email protected]' # String to match 2
match1 = re.split(pattern1,string1) # Delimited string 
match2 = re.split(pattern1,string2) # Delimited string 
match3 = re.split(pattern2,string1) # Delimited string 
match4 = re.split(pattern2,string2) # Delimited string 
match5 = re.split(pattern3,string1) # Delimited string 
match6 = re.split(pattern3,string2) # Delimited string 
print(match1)
print(match2)
print(match3)
print(match4)
print(match5)
print(match6)

The operation results are as follows :

['MY', 'SCHOOL', 'my', 'school']
[' School @[email protected]', 'my', '@school']
['MY?SCHOOL?my?school']
[' School ', 'MY', 'SCHOOL?my?', 'school']
['MY', 'SCHOOL', 'my', 'school']
[' School ', 'MY', 'SCHOOL', 'my', '', 'school']

原网站

版权声明
本文为[WHJ226]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/209/202207281432476459.html