当前位置:网站首页>Regular expressions and re Libraries

Regular expressions and re Libraries

2022-07-05 21:49:00 A.way30

Related websites

Regular expression testing
Regular expression part notes
b Station online class

Regular expressions

Definition

A syntax rule that uses expressions to match strings

Conditions of use

stay python The data analysis part of the crawler , The parsed local text content will be stored between tags or in the attributes corresponding to tags , When you want to extract the stored information , You need regular expressions .

grammar

Use metacharacters to match strings

Metacharacters

Special symbols with fixed meaning ( Match one digit by default )

Metacharacters meaning
. Matches any character other than a newline character
\w Match letters or numbers or underscores
\s Match any whitespace
\d Match the Numbers
\n Match a line break
\t Match a tab
^ Matches the beginning of the string
$ Match the end of the string
\W Match non letters or numbers or underscores
\D Match non numeric
\S Match non whitespace
a|b Matching character α Or character b
() Match the expression in brackets , It also means a group
[…] Match characters in a character set (a-z representative a To z All the letters )
[^…] Matches all characters except those in the character group

Example

eg:a11111
^\d\d\d\d\d  No matching results 

eg:11111a
\d\d\d\d\d$  No matching results 

eg: My phone number is :10010
[ I 10]   I  1 0 0 1 0

quantifiers

Controls the number of occurrences of the preceding metacharacter

quantifiers meaning
* Repeat zero or more times
+ Repeat one or more times
Repeat zero or one time
{n} repeat n Time
{n,} repeat n Times or more
{n,m} repeat n To m Time

Matching mode

.* Greedy matching
.*? Laziness matches
stay python Lazy matching is commonly used in crawlers

re modular

function

pattern: Regular expressions ;string: character string ;flags: Status bit , Embeddable rules (re.S,re.M etc. )

function meaning
compile(pattern, flags=0) Compiling a regular expression returns a regular expression object ( Preload regular expressions )
match(pattern, string, flags=0) Matching strings with regular expressions , Match from the beginning Matching object returned successfully Otherwise return to None
search(pattern, string, flags=0) The pattern of the first occurrence of a regular expression in a search string Matching object returned successfully Otherwise return to None
split(pattern, string, maxsplit=0, flags=0) Splits a string with a pattern separator specified by a regular expression Returns a list of
sub(pattern, repl, string, count=0, flags=0) Replace the pattern matching the regular expression in the original string with the specified string It can be used count Specify the number of replacements
fullmatch(pattern, string, flags=0)match Exact match of function ( From the beginning to the end of a string ) edition
findall(pattern, string, flags=0) Find all patterns in a string that match a regular expression Returns a list of strings
finditer(pattern, string, flags=0) Find all patterns in a string that match a regular expression Returns an iterator , Get the content from the iterator
purge() Clear cache of implicitly compiled regular expressions
re.I / re.IGNORECASE Ignore case match mark
re.M / re.MULTILINE Multiline match mark
re.S Single line match mark
(?P< Group name > Regular ) The content can be further extracted from the regular matching content alone (p Use capital letters )

Example

1.findall() Return to list form

ls=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls)

#['10086','10010']

2.finditer() Return iterator

it=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
for i in it:
print(i.group())

#10086
 10010

3.search() return match object

ls=re.search(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())

#10086 
 Only the first result is returned 

4.match() Match from the beginning , And search() similar

ls=re.match(r'\d+',"10086, His phone number is :10010.")
print(ls.group())
#10086 
ls=re.match(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())

# Report errors 

5.compile() Preload regular expressions

obj = re.compile(r'\d')
ret = obj.finditer(" My phone number is :10086, My girlfriend's phone number is :10010")
for it in ret:
	print(it.group())
ret = obj.findall(" Population 1000000")
print(ret)

#10086 10010
#['1000000']

6.(?P< Group name > Regular )

s="<div class='a'><span id='1'> Zhang San </span></div>
     <div class='b'><span id='2'> Li Si </span></div>
	<div class='c'><span id='3'> Wang Wu </span></div>
	<div class='d' ><span id='4'> garage </span></div>
	<div class='e'><span id='5'> Thompson </span></div>"
obj = re.compile(r"<div class='.*?'><span id=' (?P<id>\d+) '>(?P<name>.*?)</span></div>",re.S)
result = obj.finditer(s)
for it in result:
	print(it.group( "name"))
	print(it.group("id"))

# Zhang San  1
  Li Si  2
  Wang Wu  3
  garage  4
  Thompson  5

7.re.S Match the string as a whole

a = "hello123
world"
b = re.findall('hello(.*?)world',a)
c = re.findall('hello(.*?)world',a,re.S)
print ('b = ' , b)
print ('c = ' , c)

# b =[]  An empty list 
# c =['123']
原网站

版权声明
本文为[A.way30]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202140506368554.html