当前位置:网站首页>Regular expressions and re Libraries
Regular expressions and re Libraries
2022-07-05 21:49:00 【A.way30】
Related websites
Regular expression testing
Regular expression part notes
b Station online class
Regular expressions
Definition
A syntax rule that uses expressions to match strings
Conditions of use
stay python The data analysis part of the crawler , The parsed local text content will be stored between tags or in the attributes corresponding to tags , When you want to extract the stored information , You need regular expressions .
grammar
Use metacharacters to match strings
Metacharacters
Special symbols with fixed meaning ( Match one digit by default )
Metacharacters | meaning |
---|---|
. | Matches any character other than a newline character |
\w | Match letters or numbers or underscores |
\s | Match any whitespace |
\d | Match the Numbers |
\n | Match a line break |
\t | Match a tab |
^ | Matches the beginning of the string |
$ | Match the end of the string |
\W | Match non letters or numbers or underscores |
\D | Match non numeric |
\S | Match non whitespace |
a|b | Matching character α Or character b |
() | Match the expression in brackets , It also means a group |
[…] | Match characters in a character set (a-z representative a To z All the letters ) |
[^…] | Matches all characters except those in the character group |
Example
eg:a11111
^\d\d\d\d\d No matching results
eg:11111a
\d\d\d\d\d$ No matching results
eg: My phone number is :10010
[ I 10] I 1 0 0 1 0
quantifiers
Controls the number of occurrences of the preceding metacharacter
quantifiers | meaning |
---|---|
* | Repeat zero or more times |
+ | Repeat one or more times |
? | Repeat zero or one time |
{n} | repeat n Time |
{n,} | repeat n Times or more |
{n,m} | repeat n To m Time |
Matching mode
.* Greedy matching
.*? Laziness matches
stay python Lazy matching is commonly used in crawlers
re modular
function
pattern: Regular expressions ;string: character string ;flags: Status bit , Embeddable rules (re.S,re.M etc. )
function | meaning |
---|---|
compile(pattern, flags=0) | Compiling a regular expression returns a regular expression object ( Preload regular expressions ) |
match(pattern, string, flags=0) | Matching strings with regular expressions , Match from the beginning Matching object returned successfully Otherwise return to None |
search(pattern, string, flags=0) | The pattern of the first occurrence of a regular expression in a search string Matching object returned successfully Otherwise return to None |
split(pattern, string, maxsplit=0, flags=0) | Splits a string with a pattern separator specified by a regular expression Returns a list of |
sub(pattern, repl, string, count=0, flags=0) | Replace the pattern matching the regular expression in the original string with the specified string It can be used count Specify the number of replacements |
fullmatch(pattern, string, flags=0) | match Exact match of function ( From the beginning to the end of a string ) edition |
findall(pattern, string, flags=0) | Find all patterns in a string that match a regular expression Returns a list of strings |
finditer(pattern, string, flags=0) | Find all patterns in a string that match a regular expression Returns an iterator , Get the content from the iterator |
purge() | Clear cache of implicitly compiled regular expressions |
re.I / re.IGNORECASE | Ignore case match mark |
re.M / re.MULTILINE | Multiline match mark |
re.S | Single line match mark |
(?P< Group name > Regular ) | The content can be further extracted from the regular matching content alone (p Use capital letters ) |
Example
1.findall() Return to list form
ls=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls)
#['10086','10010']
2.finditer() Return iterator
it=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
for i in it:
print(i.group())
#10086
10010
3.search() return match object
ls=re.search(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())
#10086
Only the first result is returned
4.match() Match from the beginning , And search() similar
ls=re.match(r'\d+',"10086, His phone number is :10010.")
print(ls.group())
#10086
ls=re.match(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())
# Report errors
5.compile() Preload regular expressions
obj = re.compile(r'\d')
ret = obj.finditer(" My phone number is :10086, My girlfriend's phone number is :10010")
for it in ret:
print(it.group())
ret = obj.findall(" Population 1000000")
print(ret)
#10086 10010
#['1000000']
6.(?P< Group name > Regular )
s="<div class='a'><span id='1'> Zhang San </span></div>
<div class='b'><span id='2'> Li Si </span></div>
<div class='c'><span id='3'> Wang Wu </span></div>
<div class='d' ><span id='4'> garage </span></div>
<div class='e'><span id='5'> Thompson </span></div>"
obj = re.compile(r"<div class='.*?'><span id=' (?P<id>\d+) '>(?P<name>.*?)</span></div>",re.S)
result = obj.finditer(s)
for it in result:
print(it.group( "name"))
print(it.group("id"))
# Zhang San 1
Li Si 2
Wang Wu 3
garage 4
Thompson 5
7.re.S Match the string as a whole
a = "hello123
world"
b = re.findall('hello(.*?)world',a)
c = re.findall('hello(.*?)world',a,re.S)
print ('b = ' , b)
print ('c = ' , c)
# b =[] An empty list
# c =['123']
边栏推荐
- Interviewer: will concurrent programming practice meet? (detailed explanation of thread control operation)
- SecureCRT使用提示
- EL与JSTL注意事项汇总
- Selenium's method of getting attribute values in DOM
- Longest swing sequence [greedy practice]
- EasyExcel的读写操作
- Gcc9.5 offline installation
- Tips for using SecureCRT
- Explain various hot issues of Technology (SLB, redis, mysql, Kafka, Clickhouse) in detail from the architecture
- Simple interest mode - evil Chinese style
猜你喜欢
华为快游戏调用登录接口失败,返回错误码 -1
MMAP学习
Cross end solution to improve development efficiency rapidly
Ethereum ETH的奖励机制
校招期间 准备面试算法岗位 该怎么做?
Feng Tang's "spring breeze is not as good as you" digital collection, logged into xirang on July 8!
Exercise 1 simple training of R language drawing
华为云ModelArts文本分类–外卖评论
Analysis and test of ModbusRTU communication protocol
事项研发工作流全面优化|Erda 2.2 版本如“七”而至
随机推荐
EasyExcel的读写操作
Problems encountered in office--
[daily training] 729 My schedule I
Opérations de lecture et d'écriture pour easyexcel
深信服X计划-网络协议基础 DNS
Robot operation mechanism
让开发效率提升的跨端方案
crm创建基于fetch自己的自定义报告
Advantages of robot framework
Poj 3237 Tree (Tree Chain Split)
Cross end solution to improve development efficiency rapidly
Tips for using SecureCRT
Image editor for their AutoLayout environment
Exercise 1 simple training of R language drawing
2.2.5 basic sentences of R language drawing
Kingbasees v8r3 cluster maintenance case -- online addition of standby database management node
Feng Tang's "spring breeze is not as good as you" digital collection, logged into xirang on July 8!
让开发效率飞速提升的跨端方案
Matlab | app designer · I used Matlab to make a real-time editor of latex formula
MMAP