当前位置：网站首页>Regular expressions and re Libraries

Regular expressions and re Libraries

2022-07-05 21:49:00 【A.way30】

Related websites

Regular expression testing
Regular expression part notes
b Station online class

Regular expressions

Definition

A syntax rule that uses expressions to match strings

Conditions of use

stay python The data analysis part of the crawler , The parsed local text content will be stored between tags or in the attributes corresponding to tags , When you want to extract the stored information , You need regular expressions .

grammar

Use metacharacters to match strings

Metacharacters

Special symbols with fixed meaning （ Match one digit by default ）

Metacharacters	meaning
.	Matches any character other than a newline character
\w	Match letters or numbers or underscores
\s	Match any whitespace
\d	Match the Numbers
\n	Match a line break
\t	Match a tab
^	Matches the beginning of the string
$	Match the end of the string
\W	Match non letters or numbers or underscores
\D	Match non numeric
\S	Match non whitespace
a\|b	Matching character α Or character b
()	Match the expression in brackets , It also means a group
[…]	Match characters in a character set （a-z representative a To z All the letters )
[^…]	Matches all characters except those in the character group

Example

eg：a11111
^\d\d\d\d\d  No matching results 

eg：11111a
\d\d\d\d\d$  No matching results 

eg： My phone number is ：10010
[ I 10]   I  1 0 0 1 0

quantifiers

Controls the number of occurrences of the preceding metacharacter

quantifiers	meaning
*	Repeat zero or more times
+	Repeat one or more times
？	Repeat zero or one time
{n}	repeat n Time
{n,}	repeat n Times or more
{n,m}	repeat n To m Time

Matching mode

.* Greedy matching
.*? Laziness matches
stay python Lazy matching is commonly used in crawlers

re modular

function

pattern: Regular expressions ;string： character string ;flags： Status bit , Embeddable rules （re.S,re.M etc. ）

function	meaning
compile(pattern, flags=0)	Compiling a regular expression returns a regular expression object （ Preload regular expressions ）
match(pattern, string, flags=0)	Matching strings with regular expressions , Match from the beginning Matching object returned successfully Otherwise return to None
search(pattern, string, flags=0)	The pattern of the first occurrence of a regular expression in a search string Matching object returned successfully Otherwise return to None
split(pattern, string, maxsplit=0, flags=0)	Splits a string with a pattern separator specified by a regular expression Returns a list of
sub(pattern, repl, string, count=0, flags=0)	Replace the pattern matching the regular expression in the original string with the specified string It can be used count Specify the number of replacements
fullmatch(pattern, string, flags=0)	match Exact match of function （ From the beginning to the end of a string ） edition
findall(pattern, string, flags=0)	Find all patterns in a string that match a regular expression Returns a list of strings
finditer(pattern, string, flags=0)	Find all patterns in a string that match a regular expression Returns an iterator , Get the content from the iterator
purge()	Clear cache of implicitly compiled regular expressions
re.I / re.IGNORECASE	Ignore case match mark
re.M / re.MULTILINE	Multiline match mark
re.S	Single line match mark
（？P< Group name > Regular ）	The content can be further extracted from the regular matching content alone （p Use capital letters ）

Example

1.findall（） Return to list form

ls=re.fiindall(r'\d+'," My phone number is ：10086, His phone number is ：10010.")
print(ls)

#['10086','10010']

2.finditer() Return iterator

it=re.fiindall(r'\d+'," My phone number is ：10086, His phone number is ：10010.")
for i in it:
print(i.group())

#10086
 10010

3.search() return match object

ls=re.search(r'\d+'," My phone number is ：10086, His phone number is ：10010.")
print(ls.group())

#10086 
 Only the first result is returned

4.match() Match from the beginning , And search() similar

ls=re.match(r'\d+',"10086, His phone number is ：10010.")
print(ls.group())
#10086 
ls=re.match(r'\d+'," My phone number is ：10086, His phone number is ：10010.")
print(ls.group())

# Report errors

5.compile() Preload regular expressions

obj = re.compile(r'\d')
ret = obj.finditer(" My phone number is :10086, My girlfriend's phone number is :10010")
for it in ret:
	print(it.group())
ret = obj.findall(" Population 1000000")
print(ret)

#10086 10010
#['1000000']

6.（？P< Group name > Regular ）

s="<div class='a'><span id='1'> Zhang San </span></div>
     <div class='b'><span id='2'> Li Si </span></div>
	<div class='c'><span id='3'> Wang Wu </span></div>
	<div class='d' ><span id='4'> garage </span></div>
	<div class='e'><span id='5'> Thompson </span></div>"
obj = re.compile(r"<div class='.*?'><span id=' (?P<id>\d+) '>(?P<name>.*?)</span></div>",re.S)
result = obj.finditer(s)
for it in result:
	print(it.group( "name"))
	print(it.group("id"))

# Zhang San  1
  Li Si  2
  Wang Wu  3
  garage  4
  Thompson  5

7.re.S Match the string as a whole

a = "hello123
world"
b = re.findall('hello(.*?)world',a)
c = re.findall('hello(.*?)world',a,re.S)
print ('b = ' , b)
print ('c = ' , c)

# b =[]  An empty list 
# c =['123']

原网站

版权声明
本文为[A.way30]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202140506368554.html