当前位置:网站首页>Regular expression
Regular expression
2022-07-04 10:45:00 【She was your flaw】
Regular expressions
One 、 Match symbols
“”"
- re modular
re The module is python A module that supports regular expressions
fullmatch function :
fullmatch( Regular expressions , character string ) – Let the regular expression match the specified string exactly , If the match fails, return None;
“”"
“”"
2. Regular grammar
Regular expressions – Regular expressions are a tool to simplify complex string problems .
The main work of writing regular expressions : Use regular symbols to describe the rules of related strings .
python Regular representation :r’ Regular expressions ’
js Regular :/ Regular expressions /
“”"
from re import fullmatch
“”"
- Ordinary character ( Common symbols )
In regular, symbols other than those with special functions or special meanings ;
Ordinary characters represent the symbol itself in regular .
“”"
# Matching a string has three characters a, b and c
re_str = r'abc'
print(fullmatch(re_str, 'abc'))
“”"
2) . – Matches any character
Be careful : One . Only one arbitrary character can be matched
“”"
# Match a character , The length is three , The first character is a, The last character is c,a and c Between is any character
re_str = r'a.c'
print(fullmatch(re_str, 'abc'))
print(fullmatch(re_str, 'a+c'))
print(fullmatch(re_str, 'a good c'))
re_str = r'abc...'
print(fullmatch(re_str, 'abcm./'))
print(fullmatch(re_str, 'abcm\t/'))
print(fullmatch(re_str, 'abc G/'))
“”"
3) \d – Match any number character
“”"
re_str = r'a\d\dc'
print(fullmatch(re_str, 'a78c'))
print(fullmatch(re_str, 'a00c'))
“”"
4) \s – Match any blank character
Blank character : Space 、 enter (\n)、 tabs (\t)
“”"
re_str = r'a\sb'
print(fullmatch(re_str, 'a b'))
print(fullmatch(re_str, 'a\nb'))
print(fullmatch(re_str, 'a\tb'))
print(fullmatch(re_str, 'a b')) # None( Mismatch )
“”"
- \w – Match any letter 、 Numbers or underscores ( It fails to work well )
“”"
- \w – Match any letter 、 Numbers or underscores ( It fails to work well )
“”"
6)
\D – Match any non numeric character
\S – Match any non white space character
“”"
print(fullmatch(r'a\Sb\D', 'a>b='))
print(fullmatch(r'a\Sb\D', 'a b=')) # None
print(fullmatch(r'a\Sb\D', 'a>b0')) # None
“”"
7) [ Character set ] – Match any character in the character set
Be careful : One [] Only one character can be matched
[ Multiple ordinary characters ] – for example [abc], Can match a perhaps b perhaps c
[ contain \ Special symbol character set at the beginning ] – for example :[\dabc], You can match any number or a perhaps b perhaps c
[ A character set containing a minus sign between two characters ] – The minus sign at this time means who goes to who ( Be careful : The encoding of the characters before the minus sign must be smaller than that after the minus sign )
for example :
[a-z] – Match any lowercase letter
[a-d] – matching a、b、c、d Any character in
[A-Z] - Match any capital letter
[1-9] – matching 1 To 9 Any numeric character in
[\u4e00-\u9fa5] – Match any Chinese character
[a-zA-Z]、[A-Za-z] – Match any letter
[a-z123] – Match any lowercase letter , perhaps 1 perhaps 2 perhaps 3
[a-z\d] – Match any lowercase letter or any number
“”"
re_str = r'a[xym]b'
print(fullmatch(re_str, 'axb'))
print(fullmatch(re_str, 'ayb'))
print(fullmatch(re_str, 'amb'))
print(fullmatch(re_str, 'azb')) # None
re_str = r'a[16]b'
print(fullmatch(re_str, 'a1b'))
print(fullmatch(re_str, 'a6b'))
re_str = r'a[a\db]b'
print(fullmatch(re_str, 'a1b'))
print(fullmatch(re_str, 'aab'))
print(fullmatch(re_str, 'abb'))
print(fullmatch(r'x[a-z]y', 'xmy'))
print(fullmatch(r'x[a-zA-Z]y', 'xmy'))
print(fullmatch(r'x[a-zA-Z]y', 'xKy'))
print(fullmatch(r'x[a-zA-Z*&]y', 'x*y'))
print(fullmatch(r'x[a-zA-Z*&]y', 'xMy'))
print(fullmatch(r'x[0-9]y', 'x5y'))
print(fullmatch(r'x[-09]y', 'x-y'))
print(fullmatch(r'x[-09]y', 'x0y'))
print(fullmatch(r'x[-09]y', 'x9y'))
“”"
8) [^ Character set ] – Match any character that is not in the character set
[^abc] – Match except a、b、c Any character other than
[^a-z] – Match any character except lowercase letters
Be careful :[] Medium - and ^ Only when it is placed in the designated position can it have special functions , Otherwise, in the [] Chinese is a common character .
“”"
print(fullmatch(r'a[^\u4e00-\u9fa5]c', 'a yes c')) # None
print(fullmatch(r'a[^a-zA-Z]c', 'aKc')) # None
print(fullmatch(r'a[^a-zA-Z]c', 'a1c'))
print(fullmatch(r'a[a-z^]c', 'a^c'))
Two 、 Detection class symbol
from re import fullmatch, findall
“”"
Check that the existence of class symbols does not affect the length of the matched string , Its function is to detect whether the position of the symbol is required by the symbol on the premise of successful matching .
Detect the usage of class symbols : First remove the detection symbols , See if it can match successfully , If it fails, the whole regular matching fails . If it succeeds, let's see whether the location of the detection symbol is required by the symbol .
- \b – Detect whether it is a word boundary
Word boundaries – Symbols that can distinguish two different words belong to the word boundary , for example : blank 、 Punctuation 、 Start of string 、 End of string
“”"
re_str = r'abc\b123'
print(fullmatch(re_str, 'abc123')) # None
print(fullmatch(re_str, 'abc 123')) # None
re_str = r'abc,\b123'
print(fullmatch(re_str, 'abc,123'))
print(fullmatch(r'abc\s\b123', 'abc 123'))
# findall( Regular expressions , character string ) - Get all substrings in the string that satisfy the regular expression
str1 = '12ksksj78ss 34 Antibiotic ,89 try 7 And proved 90 56 Jiangsu Province 23'
result1 = findall(r'\d\d', str1)
print(result1) # ['12', '78', '34', '89', '90', '56', '23']
result2 = findall(r'\d\d\b', str1)
print(result2) # ['89', '90', '56', '23']
result3 = findall(r'\b\d\d\b', str1)
print(result3) # ['89', '56']
“”"
2. \B – Detect whether it is a non word boundary
“”"
result3 = findall(r'\d\d\B', str1)
print(result3)
“”"
3. ^ – Check whether it starts with a string
“”"
re_str = r'^\d\d'
print(fullmatch(re_str, '12')) # <re.Match object; span=(0, 2), match='12'>
print(findall(r'^\d\d', str1)) # ['12']
“”"
4. $ – Check whether it is the end of the string
“”"
re_str = r'\d\d$'
print(fullmatch(re_str, '67')) # <re.Match object; span=(0, 2), match='67'>
3、 ... and 、 Number of matches
from re import fullmatch, match, findall
import requests
“”"
- – matching 0 Times or more ( Any number of times )
Match class symbol usage :
a – Match any number of a
\d* – Match any number of numeric characters
“”"
- – matching 0 Times or more ( Any number of times )
print(fullmatch(r'a*b', 'b'))
print(fullmatch(r'a*b', 'aab'))
print(fullmatch(r'a*b', 'aaaaaaab'))
print(fullmatch(r'\d*b', '245899b'))
print(fullmatch(r'[abc]*x', 'aabccax'))
“”"
2. + – matching 1 Times or times ( At least once )
“”"
print(fullmatch(r'a+b', 'b')) # None
print(fullmatch(r'a+b', 'ab'))
print(fullmatch(r'a+b', 'aaaaab'))
“”"
3. ? – matching 0 Time or 1 Time
“”"
re_str = r'[-+]?[1-9]\d'
print(fullmatch(re_str, '-12'))
print(fullmatch(r'a?b', 'b'))
print(fullmatch(r'a?b', 'ab'))
print(fullmatch(r'a?b', 'aab')) # None
“”"
4. {}
{N} – matching N Time
{M,N} – matching M To N Time
{,N} – Most matches N Time
{M,} – Match at least M Time
“”"
print(fullmatch(r'\d{3}', '786'))
print(fullmatch(r'\d{3,5}', '899'))
print(fullmatch(r'\d{3,5}', '8092'))
print(fullmatch(r'\d{3,5}', '89129'))
print(fullmatch(r'\d{3,5}', '8578990')) # None
print(fullmatch(r'\d{,5}', '802'))
print(fullmatch(r'\d{,5}', '0921'))
print(fullmatch(r'\d{,5}', '081898')) # None
print(fullmatch(r'\d{3,}', '0976'))
print(fullmatch(r'\d{3,}', '31221'))
print(fullmatch(r'\d{3,}', '82')) # None
# Be careful : The symbol corresponding to the matching times must be preceded by the matching class symbol
“”"
5. Greed and non greed
When the number of matches is uncertain , Matching patterns are divided into greedy and non greedy , The default is greedy mode
On the premise of successful matching , Greed is the number of matches chosen the most ; Non greedy is the least number of matches .
*、+、?、{M,N}、{M,}、{,N} – Greedy
*?、+?、??、{M,N}?、{M,}?、{,N}? – Not greedy
“”"
print(match(r'\d{3}', '234hj Regulatory body '))
print(match(r'a.*b', 'asmmdb Regulatory body ')) # asmmdb
print(match(r'a.*?b', 'asmmdb Regulatory body ')) # asmmdb
# 'asb'、'asbmmb'、'asbmmbdb' There are three situations that can succeed , Because of greed, the last matching times are the most
print(match(r'a.*b', 'asbmmbdb Regulatory body ')) # asbmmbdb
print(match(r'a.*?b', 'asbmmbdb Regulatory body ')) # asb
Four 、 Groups and branches
from re import fullmatch
“”"
- () – grouping
effect 1: take () As a whole , Carry out overall relevant operations , for example : Overall control times
effect 2: adopt ’\M’ Repeat the previous paragraph M The result of a group match ,M from 1 Start
effect 3: Capture ( stay findall Medium lecture )
“”"
str1 = '79gt34er23sd'
print(fullmatch(r'\d\d[a-z]{2}\d\d[a-z]{2}\d\d[a-z]{2}', str1))
print(fullmatch(r'(\d\d[a-z]{2}){3}', str1))
str1 = r'abababab'
print(fullmatch(r'(ab)+', str1))
print(fullmatch(r'(\d{2})abc\1', '89abc89'))
print(fullmatch(r'(\d{2})abc\1', '89abc34')) # None
"""print(fullmatch(r'\d{2}abc\1', '89abc89'))""" # re.error( error )
print(fullmatch(r'(\d{3})([a-z]{3})-\2', '234ams-ams'))
print(fullmatch(r'(\d{3})([a-z]{3})-\1', '234ams-234'))
print(fullmatch(r'(\d{3})([a-z]{3})-\2\1', '234ams-ams234'))
print(fullmatch(r'(\d{3})([a-z]{3})-\1{2}', '234ams-234234'))
"""print(fullmatch(r'(\d{3})-\2([a-z]{3})', '234ams-ams'))""" # re.error( error )
“”"
2. | – Branch
Regular 1| Regular 2 – Start with regular 1 Match , If successful, match success , If the match fails, use regular 2 Match
“”"
# It is required to match at the same time 'abc98' and 'abcMKP'
print(fullmatch(r'abc\d{2}|abc[A-Z]{3}', 'abcKMP'))
print(fullmatch(r'abc(\d{2}|[A-Z]{3})', 'abcMKP'))
“”"
3. Escape symbol
Add... Before the special symbol \, Let the function of symbols disappear , Become a common symbol
“”"
print(fullmatch(r'\+\d{3}', '+234'))
print(fullmatch(r'\[\d{3}\]', '[234]'))
print(fullmatch(r'\\dabc', '\dabc'))
# If it is an independent symbol with special functions , Put the symbol in [] Its function will also disappear automatically
print(fullmatch(r'[+*?|()^$.]abc', '$abc'))
print(fullmatch(r'[\^abc\-z\]]123', ']123'))
5、 ... and 、re modular
import re
“”"
- compile( Regular expressions ) – Compile regular expressions , Returns a regular expression object
fullmatch( Regular expressions , character string )
Regular expression objects .fullmatch( character string )
“”"
re_obj = re.compile(r'\d{3}')
print(re_obj.fullmatch('234'))
print(re.fullmatch(r'\d{3}', '234'))
“”"
2.
fullmatch( Regular expressions , character string ) – Let the regular expression or the whole string match ( perfect match ), Match failed return None, Matching success returns the matching object .
match( Regular expressions , character string ) – Match the beginning of a string ( Rules for judging whether the beginning of a string is regular ), Match failed return None, Matching success returns the matching object .
“”"
result = re.fullmatch(r'\d{3}', '345')
print(result) # <re.Match object; span=(0, 3), match='345'> -- A match object
“”"
- Get the matching string
A match object .group() / A match object .group(0) – Get the result of the whole regular match
A match object .group(N) - For the first N The result of a group match
“”"
“”"
2) Get the position information of the matching result in the original string
A match object .span() - A tuple is returned , The elements in a tuple are the start and end subscripts , The position corresponding to the end subscript cannot be obtained
A match object .span(N)
“”"
print(result.span())
“”"
3. search( Regular expressions , character string ) – Get the first string that satisfies the regular expression . The return is None Or match the object
“”"
result = re.search(r'\d{3}', 'djrbj324ok,378')
print(result) # <re.Match object; span=(5, 8), match='324'>
print(result.group()) # 324
“”"
2) Get the position information of the matching result in the original string
A match object .span() – A tuple is returned , The elements in a tuple are the start and end subscripts , The position corresponding to the end subscript cannot be obtained
A match object .span(N)
“”"
“”"
4. findall( Regular expressions , character string ) – Get all the regular substrings in the string , Back to the list , The elements in the list are substrings ( When there is no grouping )
If there is only one group in the regular : The elements in the returned list are the results that each group matches
If there are two or more groups in the regular : The elements in the returned list are tuples , Elements in tuples are the result of each group matching
“”"
result = re.findall(r'\d{2}', '34ssd908 On computer 23,udh89, try 89123')
print(result) # ['34', '90', '23', '89', '89', '12']
result = re.findall(r'(\d{2})\D', '34ssd908 On computer 23,udh89, try 89123')
print(result) # ['34', '08', '23', '89']
result = re.findall(r'((\d[a-z]){2})', '2m4m Driver 9k0o try 3k5l--')
print(result) # [('2m4m', '4m'), ('9k0o', '0o'), ('3k5l', '5l')]
result = re.findall(r'(\d{2})-([a-z]{3})', '23-msn The data is 98-kop Christmas delivery ')
print(result) # [('23', 'msn'), ('98', 'kop')]
“”"
5. finditer( Regular expressions , character string ) – Get all the regular substrings in the string , It returns an iterator , Iterators are matching objects
“”"
result = re.finditer(r'(\d{2})-([a-z]{3})', '23-msn The data is 98-kop Christmas delivery ')
print(result)
r1 = next(result)
print(r1, r1.group(), r1.group(1), r1.group(2))
“”"
6. split( Regular expressions , character string ) – Take all substrings of the string that satisfy the regular expression as the cutting point , Cut strings
re.split( Regular expressions , character string ,N) – Put the string before N A substring satisfying the regular expression is used as the cutting point , Cut strings
“”"
result = re.split(r'\d+', ' Yes 9564s Twin horizon 09 Century Oriental and 3d Cry cry 2 try ')
print(result)
“”"
7. sub( Regular expressions , character string 1, character string 2) – The string 2 Replace all substrings satisfying the regular expression in with strings 1
sub( Regular expressions , character string 1, character string 2,N) – The string 2 Middle front N Replace the substring satisfying the regular expression with a string 1
“”"
result = re.sub(r'\d+', '*', ' Yes 9564s Twin horizon 09 Century Oriental and 3d Cry cry 2 try ')
print(result)
message = 'f u c k you! It's a mess , you TM Don't you see ?SB'
re_str = open('badLanguage.txt', encoding='utf-8').read()
re_str = r'(?i)%s' % re_str
result = re.sub(re_str, '*', message)
print(result)
“”"
8. flags Parameters
Each of the above functions has a parameter flags, Used to set regular parameters
1) Single line matching and multi line matching parameters :re.S、re.M( default )
Single match :. Can match \n
Multi-line matching :. Unable to join \n matching
flags=re.S <==> r’(?s) Regular expressions ’
2) Ignore case :re.I
flags = re.I <==> r'(?i) Regular expressions '
flags=re.S|re.I <==> r'(?si) Regular expressions '
“”"
print(re.fullmatch(r'a.b', 'a\nb', flags=re.M)) # None
print(re.fullmatch(r'a.b', 'a\nb')) # None
print(re.fullmatch(r'a.b', 'a\nb', flags=re.S))
print(re.fullmatch(r'(?s)a.b', 'a\nb'))
print('---------------------------------------------------')
print(re.fullmatch(r'abc', 'abc'))
print(re.fullmatch(r'abc', 'Abc')) # None
print(re.fullmatch(r'abc', 'ABc', flags=re.I))
print(re.fullmatch(r'(?i)abc', 'ABc'))
print(re.fullmatch(r'a.b', 'A\nb', flags=re.S|re.I))
print(re.fullmatch(r'(?is)a.b', 'A\nb'))
Homework
Use regular expressions to complete the following operations :
One 、 Indefinite multiple choice questions
- Can exactly match the string
"(010)-62661617"
And string"01062661617"
Regular expressions for include (A 、B、D)
A.r"\(?\d{3}\)?-?\d{8}"
B. r"[0-9()-]+"
C.r"[0-9(-)]*\d*"
D.r"[(]?\d*[)-]*\d*"
Can exactly match the string “back” and “back-end” Regular expressions for include ( A、B、C 、D)
A. r“\w{4}-\w{3}|\w{4}”
B. r“\w{4}|\w{4}-\w{3}”
C.r “\S±\S+|\S+”
D. r“\w*\b-\b\w*|\w*”Can exactly match the string “go go” and “kitty kitty”, But not exactly “go kitty” Regular expressions for include (A,D)
A.r “\b(\w+)\b\s+\1\b”
B. r“\w{2,5}\s*\1”
C. r“(\S+) \s+\1”
D. r“(\S{2,5})\s{1,}\1”Can match in a string “aab”, It doesn't match “aaab” and “aaaab” Regular expressions for include (B,C )
A. r“a*?b”
B. r“a{,2}b”
C. r“aa??b”
D. r“aaa??b”
Two 、 Programming questions
1. The user name matches
requirement : 1. The user name can only contain numbers Letter Underline
2. Cannot start with a number
3.⻓ Duzai 6 To 16 Bit range
from re import fullmatch
username = 'ss3568ahh'
result = re.fullmatch(r'[a-zA-Z_][A-Za-z\d]{5,15}', username)
if result:
print(f' user name :{
username} legal ')
else:
print(f' user name :{
username} illegal ')
- Password matching
requirement : 1. Can not contain [email protected]#¥%^&* These special symbols
2. Must start with a letter
3.⻓ Duzai 6 To 12 Bit range
from re import fullmatch
pw = 'dka4i53'
result = fullmatch(r'[a-zA-Z][^[email protected]#¥%^&*]{5,11}', pw)
if result:
print(f' password {
pw} legal ')
else:
print(f' password {
pw} illegal ')
- ipv4 Format ip Address matching
Tips : IP The range of addresses is 0.0.0.0 - 255.255.255.255
""" 0 - 255 (0-9; 10-99; 100-199; 200-249; 250-255) 0-9: \d 10-99:[1-9]\d 100-199: 1\d[2] 200-249: 2[0-4]\d 250-255: 25[0-5] 0-255 Regular :(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]) ip: ((\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]) """
ip = '255.5.255.255'
result = fullmatch(r'((\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])', ip)
if result:
print(f'{
ip} legal ')
else:
print(f'{
ip} illegal ')
- Extract values from user input data ( Values include positive and negative numbers It also includes integers and decimals ) And sum up
for example :“-3.14good87nice19bye” =====> -3.14 + 87 + 19 = 102.86
import re
str1 = '-3.14good87nice19bye'
result = re.findall(r'-?\d+\.?\d*', str1)
print(sum([float(x) for x in result])) # 102.86
Verify that the input content can only be Chinese characters
str1 = ' Gentle his return ' result = fullmatch(r'[\u4e00-\u9fa5]+', str1) print(result)
Match integers or decimals ( Including positive and negative numbers )
""" Integers :[-+]?([1-9]\d*|0) -0、2、9、+19、342、34523、-0、+0( legal ) 00、09、00087、--9、++23、-+23、 ( illegal ) decimal :[-+]?(0|[1-9]\d*)\.\d+ 0.23、23.98、2.00、0.00 ( legal ) 003.9、0.、34. ( illegal ) An integer or a decimal :[-+]?([1-9]\d*|0)(\.\d+)? """ str1 = input(' Please enter an integer or decimal :') result = fullmatch(r'[-+]?([1-9]\d*|0)(\.\d+)?', str1) print(result)
Verify the user name and password entered QQ Whether the number is valid and gives the corresponding prompt information
requirement :
The user name must consist of the letters 、 Numbers or underscores are formed and the length is within 6~20 Between charactersuser_name = input(' Please enter a user name :') user_QQ = input(' Please enter the user's QQ Number :') if not re.fullmatch(r'(?i)[a-z_]{6,20}', user_name): print(' The user name entered is illegal ') else: print(' The user name is legal ') if not re.fullmatch(r'[1-9][\d]{4,11}', user_QQ): print(' Input QQ Illegal number ') else: print('QQ The number is legal ')
Split long string : Take out each sentence of a poem separately
poem = ‘ The bright moon in front of the window , The frost on the ground . look at the bright moon , Bow your head and think of your hometown .’
poem = ' The bright moon in front of the window , The frost on the ground . look at the bright moon , Bow your head and think of your hometown .' result = re.split(r'[,.]\b', poem) print(result)
边栏推荐
- [machine] [server] Taishan 200
- The time difference between the past time and the present time of uniapp processing, such as just, a few minutes ago, a few hours ago, a few months ago
- If the uniapp is less than 1000, it will be displayed according to the original number. If the number exceeds 1000, it will be converted into 10w+ 1.3k+ display
- [Galaxy Kirin V10] [server] system partition expansion
- Doris / Clickhouse / Hudi, a phased summary in June
- Article publishing experiment
- /*The rewriter outputs the contents of the IA array. It is required that the type defined by typedef cannot be used in the outer loop*/
- Occasional pit compiled by idea
- Huge number multiplication (C language)
- Two way process republication + routing policy
猜你喜欢
Tables in the thesis of latex learning
Dynamic memory management
20 minutes to learn what XML is_ XML learning notes_ What is an XML file_ Basic grammatical rules_ How to parse
【Day1】 deep-learning-basics
[Galaxy Kirin V10] [server] NUMA Technology
[test theory] test process management
[Galaxy Kirin V10] [server] FTP introduction and common scenario construction
Using SA token to solve websocket handshake authentication
DCL statement of MySQL Foundation
Safety reinforcement learning based on linear function approximation safe RL with linear function approximation translation 2
随机推荐
[test theory] test phase analysis (unit, integration, system test)
Software sharing: the best PDF document conversion tool and PDF Suite Enterprise version sharing | with sharing
Online troubleshooting
How do microservices aggregate API documents? This wave of show~
Rhcsa operation
Communication layer of csframework
Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool
Latex learning insertion number - list of filled dots, bars, numbers
BGP advanced experiment
Rhcsa day 9
Jianzhi offer 04 (implemented in C language)
[Galaxy Kirin V10] [server] set time synchronization of intranet server
Static comprehensive experiment ---hcip1
[Galaxy Kirin V10] [server] FTP introduction and common scenario construction
Development guidance document of CMDB
[Galaxy Kirin V10] [server] grub default password
Introduction to extensible system architecture
Application and Optimization Practice of redis in vivo push platform
Huge number multiplication (C language)
leetcode1-3