当前位置:网站首页>Regular expression

Regular expression

2022-07-04 10:45:00 She was your flaw

Regular expressions

One 、 Match symbols

“”"

  1. re modular
    re The module is python A module that supports regular expressions
    fullmatch function :
    fullmatch( Regular expressions , character string ) – Let the regular expression match the specified string exactly , If the match fails, return None;
    “”"

“”"
2. Regular grammar
Regular expressions – Regular expressions are a tool to simplify complex string problems .
The main work of writing regular expressions : Use regular symbols to describe the rules of related strings .

python Regular representation :r’ Regular expressions ’
js Regular :/ Regular expressions /
“”"

from re import fullmatch

“”"

  1. Ordinary character ( Common symbols )
    In regular, symbols other than those with special functions or special meanings ;
    Ordinary characters represent the symbol itself in regular .
    “”"
#  Matching a string has three characters a, b and c
re_str = r'abc'
print(fullmatch(re_str, 'abc'))

“”"
2) . – Matches any character
Be careful : One . Only one arbitrary character can be matched
“”"

#  Match a character , The length is three , The first character is a, The last character is c,a and c Between is any character 
re_str = r'a.c'
print(fullmatch(re_str, 'abc'))
print(fullmatch(re_str, 'a+c'))
print(fullmatch(re_str, 'a good c'))

re_str = r'abc...'
print(fullmatch(re_str, 'abcm./'))
print(fullmatch(re_str, 'abcm\t/'))
print(fullmatch(re_str, 'abc G/'))

“”"
3) \d – Match any number character
“”"

re_str = r'a\d\dc'
print(fullmatch(re_str, 'a78c'))
print(fullmatch(re_str, 'a00c'))

“”"
4) \s – Match any blank character
Blank character : Space 、 enter (\n)、 tabs (\t)
“”"

re_str = r'a\sb'
print(fullmatch(re_str, 'a b'))
print(fullmatch(re_str, 'a\nb'))
print(fullmatch(re_str, 'a\tb'))
print(fullmatch(re_str, 'a b'))    # None( Mismatch )

“”"

    1. \w – Match any letter 、 Numbers or underscores ( It fails to work well )
      “”"

“”"
6)
\D – Match any non numeric character
\S – Match any non white space character
“”"

print(fullmatch(r'a\Sb\D', 'a>b='))
print(fullmatch(r'a\Sb\D', 'a b='))         # None
print(fullmatch(r'a\Sb\D', 'a>b0'))         # None

“”"
7) [ Character set ] – Match any character in the character set
Be careful : One [] Only one character can be matched
[ Multiple ordinary characters ] – for example [abc], Can match a perhaps b perhaps c
[ contain \ Special symbol character set at the beginning ] – for example :[\dabc], You can match any number or a perhaps b perhaps c

[ A character set containing a minus sign between two characters ] – The minus sign at this time means who goes to who ( Be careful : The encoding of the characters before the minus sign must be smaller than that after the minus sign )
for example :
[a-z] – Match any lowercase letter
[a-d] – matching a、b、c、d Any character in
[A-Z] - Match any capital letter
[1-9] – matching 1 To 9 Any numeric character in
[\u4e00-\u9fa5] – Match any Chinese character
[a-zA-Z]、[A-Za-z] – Match any letter
[a-z123] – Match any lowercase letter , perhaps 1 perhaps 2 perhaps 3
[a-z\d] – Match any lowercase letter or any number
“”"

re_str = r'a[xym]b'
print(fullmatch(re_str, 'axb'))
print(fullmatch(re_str, 'ayb'))
print(fullmatch(re_str, 'amb'))
print(fullmatch(re_str, 'azb'))     # None

re_str = r'a[16]b'
print(fullmatch(re_str, 'a1b'))
print(fullmatch(re_str, 'a6b'))

re_str = r'a[a\db]b'
print(fullmatch(re_str, 'a1b'))
print(fullmatch(re_str, 'aab'))
print(fullmatch(re_str, 'abb'))

print(fullmatch(r'x[a-z]y', 'xmy'))

print(fullmatch(r'x[a-zA-Z]y', 'xmy'))
print(fullmatch(r'x[a-zA-Z]y', 'xKy'))

print(fullmatch(r'x[a-zA-Z*&]y', 'x*y'))
print(fullmatch(r'x[a-zA-Z*&]y', 'xMy'))

print(fullmatch(r'x[0-9]y', 'x5y'))

print(fullmatch(r'x[-09]y', 'x-y'))
print(fullmatch(r'x[-09]y', 'x0y'))
print(fullmatch(r'x[-09]y', 'x9y'))

“”"
8) [^ Character set ] – Match any character that is not in the character set
[^abc] – Match except a、b、c Any character other than
[^a-z] – Match any character except lowercase letters

Be careful :[] Medium - and ^ Only when it is placed in the designated position can it have special functions , Otherwise, in the [] Chinese is a common character .
“”"

print(fullmatch(r'a[^\u4e00-\u9fa5]c', 'a yes c'))   # None
print(fullmatch(r'a[^a-zA-Z]c', 'aKc'))          # None
print(fullmatch(r'a[^a-zA-Z]c', 'a1c'))
print(fullmatch(r'a[a-z^]c', 'a^c'))

Two 、 Detection class symbol

from re import fullmatch, findall

“”"
Check that the existence of class symbols does not affect the length of the matched string , Its function is to detect whether the position of the symbol is required by the symbol on the premise of successful matching .
Detect the usage of class symbols : First remove the detection symbols , See if it can match successfully , If it fails, the whole regular matching fails . If it succeeds, let's see whether the location of the detection symbol is required by the symbol .

  1. \b – Detect whether it is a word boundary
    Word boundaries – Symbols that can distinguish two different words belong to the word boundary , for example : blank 、 Punctuation 、 Start of string 、 End of string

“”"

re_str = r'abc\b123'
print(fullmatch(re_str, 'abc123'))      # None
print(fullmatch(re_str, 'abc 123'))     # None

re_str = r'abc,\b123'
print(fullmatch(re_str, 'abc,123'))

print(fullmatch(r'abc\s\b123', 'abc 123'))

# findall( Regular expressions ,  character string ) -  Get all substrings in the string that satisfy the regular expression 
str1 = '12ksksj78ss 34 Antibiotic ,89  try 7 And proved 90 56  Jiangsu Province 23'
result1 = findall(r'\d\d', str1)
print(result1)           # ['12', '78', '34', '89', '90', '56', '23']

result2 = findall(r'\d\d\b', str1)
print(result2)          # ['89', '90', '56', '23']

result3 = findall(r'\b\d\d\b', str1)
print(result3)          # ['89', '56']

“”"
2. \B – Detect whether it is a non word boundary
“”"

result3 = findall(r'\d\d\B', str1)
print(result3)

“”"
3. ^ – Check whether it starts with a string
“”"

re_str = r'^\d\d'
print(fullmatch(re_str, '12'))      # <re.Match object; span=(0, 2), match='12'>
print(findall(r'^\d\d',  str1))     # ['12']

“”"
4. $ – Check whether it is the end of the string
“”"

re_str = r'\d\d$'
print(fullmatch(re_str, '67'))      # <re.Match object; span=(0, 2), match='67'>

3、 ... and 、 Number of matches

from re import fullmatch, match, findall
import requests

“”"

    • – matching 0 Times or more ( Any number of times )
      Match class symbol usage :
      a
      – Match any number of a
      \d* – Match any number of numeric characters
      “”"
print(fullmatch(r'a*b', 'b'))
print(fullmatch(r'a*b', 'aab'))
print(fullmatch(r'a*b', 'aaaaaaab'))
print(fullmatch(r'\d*b', '245899b'))
print(fullmatch(r'[abc]*x', 'aabccax'))

“”"
2. + – matching 1 Times or times ( At least once )
“”"

print(fullmatch(r'a+b', 'b'))           # None
print(fullmatch(r'a+b', 'ab'))
print(fullmatch(r'a+b', 'aaaaab'))

“”"
3. ? – matching 0 Time or 1 Time
“”"

re_str = r'[-+]?[1-9]\d'
print(fullmatch(re_str, '-12'))
print(fullmatch(r'a?b', 'b'))
print(fullmatch(r'a?b', 'ab'))
print(fullmatch(r'a?b', 'aab'))     # None

“”"
4. {}
{N} – matching N Time
{M,N} – matching M To N Time
{,N} – Most matches N Time
{M,} – Match at least M Time
“”"

print(fullmatch(r'\d{3}', '786'))

print(fullmatch(r'\d{3,5}', '899'))
print(fullmatch(r'\d{3,5}', '8092'))
print(fullmatch(r'\d{3,5}', '89129'))
print(fullmatch(r'\d{3,5}', '8578990'))     # None

print(fullmatch(r'\d{,5}', '802'))
print(fullmatch(r'\d{,5}', '0921'))
print(fullmatch(r'\d{,5}', '081898'))       # None

print(fullmatch(r'\d{3,}', '0976'))
print(fullmatch(r'\d{3,}', '31221'))
print(fullmatch(r'\d{3,}', '82'))           # None

#  Be careful : The symbol corresponding to the matching times must be preceded by the matching class symbol 

“”"
5. Greed and non greed
When the number of matches is uncertain , Matching patterns are divided into greedy and non greedy , The default is greedy mode
On the premise of successful matching , Greed is the number of matches chosen the most ; Non greedy is the least number of matches .

*、+、?、{M,N}、{M,}、{,N} – Greedy
*?、+?、??、{M,N}?、{M,}?、{,N}? – Not greedy
“”"

print(match(r'\d{3}', '234hj Regulatory body '))
print(match(r'a.*b', 'asmmdb Regulatory body '))     # asmmdb
print(match(r'a.*?b', 'asmmdb Regulatory body '))    # asmmdb

# 'asb'、'asbmmb'、'asbmmbdb'  There are three situations that can succeed , Because of greed, the last matching times are the most 
print(match(r'a.*b', 'asbmmbdb Regulatory body '))       # asbmmbdb
print(match(r'a.*?b', 'asbmmbdb Regulatory body '))      # asb

Four 、 Groups and branches

from re import fullmatch

“”"

  1. () – grouping
    effect 1: take () As a whole , Carry out overall relevant operations , for example : Overall control times
    effect 2: adopt ’\M’ Repeat the previous paragraph M The result of a group match ,M from 1 Start
    effect 3: Capture ( stay findall Medium lecture )
    “”"
str1 = '79gt34er23sd'
print(fullmatch(r'\d\d[a-z]{2}\d\d[a-z]{2}\d\d[a-z]{2}', str1))
print(fullmatch(r'(\d\d[a-z]{2}){3}', str1))

str1 = r'abababab'
print(fullmatch(r'(ab)+', str1))


print(fullmatch(r'(\d{2})abc\1', '89abc89'))
print(fullmatch(r'(\d{2})abc\1', '89abc34'))        # None

"""print(fullmatch(r'\d{2}abc\1', '89abc89'))"""   # re.error( error )

print(fullmatch(r'(\d{3})([a-z]{3})-\2', '234ams-ams'))
print(fullmatch(r'(\d{3})([a-z]{3})-\1', '234ams-234'))
print(fullmatch(r'(\d{3})([a-z]{3})-\2\1', '234ams-ams234'))
print(fullmatch(r'(\d{3})([a-z]{3})-\1{2}', '234ams-234234'))

"""print(fullmatch(r'(\d{3})-\2([a-z]{3})', '234ams-ams'))"""     # re.error( error )

“”"
2. | – Branch
Regular 1| Regular 2 – Start with regular 1 Match , If successful, match success , If the match fails, use regular 2 Match
“”"

#  It is required to match at the same time 'abc98' and 'abcMKP'
print(fullmatch(r'abc\d{2}|abc[A-Z]{3}', 'abcKMP'))
print(fullmatch(r'abc(\d{2}|[A-Z]{3})', 'abcMKP'))

“”"
3. Escape symbol
Add... Before the special symbol \, Let the function of symbols disappear , Become a common symbol
“”"

print(fullmatch(r'\+\d{3}', '+234'))
print(fullmatch(r'\[\d{3}\]', '[234]'))
print(fullmatch(r'\\dabc', '\dabc'))

#  If it is an independent symbol with special functions , Put the symbol in [] Its function will also disappear automatically 
print(fullmatch(r'[+*?|()^$.]abc', '$abc'))
print(fullmatch(r'[\^abc\-z\]]123', ']123'))

5、 ... and 、re modular

import re

“”"

  1. compile( Regular expressions ) – Compile regular expressions , Returns a regular expression object

fullmatch( Regular expressions , character string )
Regular expression objects .fullmatch( character string )
“”"

re_obj = re.compile(r'\d{3}')
print(re_obj.fullmatch('234'))

print(re.fullmatch(r'\d{3}', '234'))

“”"
2.
fullmatch( Regular expressions , character string ) – Let the regular expression or the whole string match ( perfect match ), Match failed return None, Matching success returns the matching object .
match( Regular expressions , character string ) – Match the beginning of a string ( Rules for judging whether the beginning of a string is regular ), Match failed return None, Matching success returns the matching object .
“”"

result = re.fullmatch(r'\d{3}', '345')
print(result)       # <re.Match object; span=(0, 3), match='345'>   --    A match object 

“”"

  1. Get the matching string
    A match object .group() / A match object .group(0) – Get the result of the whole regular match
    A match object .group(N) - For the first N The result of a group match
    “”"

“”"
2) Get the position information of the matching result in the original string
A match object .span() - A tuple is returned , The elements in a tuple are the start and end subscripts , The position corresponding to the end subscript cannot be obtained
A match object .span(N)
“”"

print(result.span())

“”"
3. search( Regular expressions , character string ) – Get the first string that satisfies the regular expression . The return is None Or match the object
“”"

result = re.search(r'\d{3}', 'djrbj324ok,378')
print(result)       # <re.Match object; span=(5, 8), match='324'>
print(result.group())       # 324

“”"
2) Get the position information of the matching result in the original string
A match object .span() – A tuple is returned , The elements in a tuple are the start and end subscripts , The position corresponding to the end subscript cannot be obtained
A match object .span(N)
“”"

“”"
4. findall( Regular expressions , character string ) – Get all the regular substrings in the string , Back to the list , The elements in the list are substrings ( When there is no grouping )
If there is only one group in the regular : The elements in the returned list are the results that each group matches
If there are two or more groups in the regular : The elements in the returned list are tuples , Elements in tuples are the result of each group matching
“”"

result = re.findall(r'\d{2}', '34ssd908 On computer 23,udh89, try 89123')
print(result)   # ['34', '90', '23', '89', '89', '12']

result = re.findall(r'(\d{2})\D', '34ssd908 On computer 23,udh89, try 89123')
print(result)   # ['34', '08', '23', '89']

result = re.findall(r'((\d[a-z]){2})', '2m4m Driver 9k0o try 3k5l--')
print(result)       # [('2m4m', '4m'), ('9k0o', '0o'), ('3k5l', '5l')]

result = re.findall(r'(\d{2})-([a-z]{3})', '23-msn The data is 98-kop Christmas delivery ')
print(result)       # [('23', 'msn'), ('98', 'kop')]

“”"
5. finditer( Regular expressions , character string ) – Get all the regular substrings in the string , It returns an iterator , Iterators are matching objects
“”"

result = re.finditer(r'(\d{2})-([a-z]{3})', '23-msn The data is 98-kop Christmas delivery ')
print(result)
r1 = next(result)
print(r1, r1.group(), r1.group(1), r1.group(2))

“”"
6. split( Regular expressions , character string ) – Take all substrings of the string that satisfy the regular expression as the cutting point , Cut strings
re.split( Regular expressions , character string ,N) – Put the string before N A substring satisfying the regular expression is used as the cutting point , Cut strings
“”"

result = re.split(r'\d+', ' Yes 9564s Twin horizon 09 Century Oriental and 3d Cry cry 2 try ')
print(result)

“”"
7. sub( Regular expressions , character string 1, character string 2) – The string 2 Replace all substrings satisfying the regular expression in with strings 1
sub( Regular expressions , character string 1, character string 2,N) – The string 2 Middle front N Replace the substring satisfying the regular expression with a string 1
“”"

result = re.sub(r'\d+', '*', ' Yes 9564s Twin horizon 09 Century Oriental and 3d Cry cry 2 try ')
print(result)

message = 'f u c k you!  It's a mess , you TM Don't you see ?SB'
re_str = open('badLanguage.txt', encoding='utf-8').read()
re_str = r'(?i)%s' % re_str
result = re.sub(re_str, '*', message)
print(result)

“”"
8. flags Parameters
Each of the above functions has a parameter flags, Used to set regular parameters
1) Single line matching and multi line matching parameters :re.S、re.M( default )
Single match :. Can match \n
Multi-line matching :. Unable to join \n matching

flags=re.S <==> r’(?s) Regular expressions ’

2)  Ignore case :re.I
flags = re.I  <==>  r'(?i) Regular expressions '

flags=re.S|re.I  <==>  r'(?si) Regular expressions '

“”"

print(re.fullmatch(r'a.b', 'a\nb', flags=re.M))     # None
print(re.fullmatch(r'a.b', 'a\nb'))                 # None
print(re.fullmatch(r'a.b', 'a\nb', flags=re.S))
print(re.fullmatch(r'(?s)a.b', 'a\nb'))

print('---------------------------------------------------')
print(re.fullmatch(r'abc', 'abc'))
print(re.fullmatch(r'abc', 'Abc'))      # None
print(re.fullmatch(r'abc', 'ABc', flags=re.I))
print(re.fullmatch(r'(?i)abc', 'ABc'))

print(re.fullmatch(r'a.b', 'A\nb', flags=re.S|re.I))
print(re.fullmatch(r'(?is)a.b', 'A\nb'))

Homework

Use regular expressions to complete the following operations :

One 、 Indefinite multiple choice questions

  1. Can exactly match the string "(010)-62661617" And string "01062661617" Regular expressions for include (A 、B、D)

A.r"\(?\d{3}\)?-?\d{8}"
B. r"[0-9()-]+"
C.r"[0-9(-)]*\d*"
D.r"[(]?\d*[)-]*\d*"

  1. Can exactly match the string “back” and “back-end” Regular expressions for include ( A、B、C 、D)
    A. r“\w{4}-\w{3}|\w{4}”
    B. r“\w{4}|\w{4}-\w{3}”
    C.r “\S±\S+|\S+”
    D. r“\w*\b-\b\w*|\w*”

  2. Can exactly match the string “go go” and “kitty kitty”, But not exactly “go kitty” Regular expressions for include (A,D)
    A.r “\b(\w+)\b\s+\1\b”
    B. r“\w{2,5}\s*\1”
    C. r“(\S+) \s+\1”
    D. r“(\S{2,5})\s{1,}\1”

  3. Can match in a string “aab”, It doesn't match “aaab” and “aaaab” Regular expressions for include (B,C )
    A. r“a*?b”
    B. r“a{,2}b”
    C. r“aa??b”
    D. r“aaa??b”

Two 、 Programming questions

1. The user name matches

​ requirement : 1. The user name can only contain numbers Letter Underline

​ 2. Cannot start with a number

​ 3.⻓ Duzai 6 To 16 Bit range

from re import fullmatch

username = 'ss3568ahh'
result = re.fullmatch(r'[a-zA-Z_][A-Za-z\d]{5,15}', username)
if result:
    print(f' user name :{
      username}  legal ')
else:
    print(f' user name :{
      username}  illegal ')
  1. Password matching

​ requirement : 1. Can not contain [email protected]#¥%^&* These special symbols

​ 2. Must start with a letter

​ 3.⻓ Duzai 6 To 12 Bit range

from re import fullmatch

pw = 'dka4i53'
result = fullmatch(r'[a-zA-Z][^[email protected]#¥%^&*]{5,11}', pw)
if result:
    print(f' password  {
      pw}  legal ')
else:
    print(f' password  {
      pw}  illegal ')
  1. ipv4 Format ip Address matching
    Tips : IP The range of addresses is 0.0.0.0 - 255.255.255.255
""" 0 - 255 (0-9; 10-99; 100-199; 200-249; 250-255) 0-9: \d 10-99:[1-9]\d 100-199: 1\d[2] 200-249: 2[0-4]\d 250-255: 25[0-5] 0-255 Regular :(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]) ip: ((\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]) """

ip = '255.5.255.255'
result = fullmatch(r'((\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])', ip)
if result:
    print(f'{
      ip}  legal ')
else:
    print(f'{
      ip}  illegal ')
  1. Extract values from user input data ( Values include positive and negative numbers It also includes integers and decimals ) And sum up
 for example :“-3.14good87nice19bye” =====> -3.14 + 87 + 19 = 102.86
import re

str1 = '-3.14good87nice19bye'
result = re.findall(r'-?\d+\.?\d*', str1)

print(sum([float(x) for x in result]))  # 102.86
  1. Verify that the input content can only be Chinese characters

    str1 = ' Gentle his return '
    result = fullmatch(r'[\u4e00-\u9fa5]+', str1)
    print(result)
    
  2. Match integers or decimals ( Including positive and negative numbers )

    """  Integers :[-+]?([1-9]\d*|0) -0、2、9、+19、342、34523、-0、+0( legal ) 00、09、00087、--9、++23、-+23、 ( illegal )  decimal :[-+]?(0|[1-9]\d*)\.\d+ 0.23、23.98、2.00、0.00 ( legal ) 003.9、0.、34. ( illegal )  An integer or a decimal :[-+]?([1-9]\d*|0)(\.\d+)? """
    str1 = input(' Please enter an integer or decimal :')
     result = fullmatch(r'[-+]?([1-9]\d*|0)(\.\d+)?', str1)
     print(result)
    
  3. Verify the user name and password entered QQ Whether the number is valid and gives the corresponding prompt information

    requirement :
    The user name must consist of the letters 、 Numbers or underscores are formed and the length is within 6~20 Between characters

     user_name = input(' Please enter a user name :')
     user_QQ = input(' Please enter the user's QQ Number :')
     if not re.fullmatch(r'(?i)[a-z_]{6,20}', user_name):
         print(' The user name entered is illegal ')
     else:
         print(' The user name is legal ')
     if not re.fullmatch(r'[1-9][\d]{4,11}', user_QQ):
         print(' Input QQ Illegal number ')
     else:
         print('QQ The number is legal ')
    
  4. Split long string : Take out each sentence of a poem separately

    ​ poem = ‘ The bright moon in front of the window , The frost on the ground . look at the bright moon , Bow your head and think of your hometown .’

    poem = ' The bright moon in front of the window , The frost on the ground . look at the bright moon , Bow your head and think of your hometown .'
     result = re.split(r'[,.]\b', poem)
     print(result)
    
原网站

版权声明
本文为[She was your flaw]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202141415425065.html