当前位置:网站首页>Regular expressions and re Libraries
Regular expressions and re Libraries
2022-07-05 21:49:00 【A.way30】
Related websites
Regular expression testing
Regular expression part notes
b Station online class
Regular expressions
Definition
A syntax rule that uses expressions to match strings
Conditions of use
stay python The data analysis part of the crawler , The parsed local text content will be stored between tags or in the attributes corresponding to tags , When you want to extract the stored information , You need regular expressions .
grammar
Use metacharacters to match strings
Metacharacters
Special symbols with fixed meaning ( Match one digit by default )
Metacharacters | meaning |
---|---|
. | Matches any character other than a newline character |
\w | Match letters or numbers or underscores |
\s | Match any whitespace |
\d | Match the Numbers |
\n | Match a line break |
\t | Match a tab |
^ | Matches the beginning of the string |
$ | Match the end of the string |
\W | Match non letters or numbers or underscores |
\D | Match non numeric |
\S | Match non whitespace |
a|b | Matching character α Or character b |
() | Match the expression in brackets , It also means a group |
[…] | Match characters in a character set (a-z representative a To z All the letters ) |
[^…] | Matches all characters except those in the character group |
Example
eg:a11111
^\d\d\d\d\d No matching results
eg:11111a
\d\d\d\d\d$ No matching results
eg: My phone number is :10010
[ I 10] I 1 0 0 1 0
quantifiers
Controls the number of occurrences of the preceding metacharacter
quantifiers | meaning |
---|---|
* | Repeat zero or more times |
+ | Repeat one or more times |
? | Repeat zero or one time |
{n} | repeat n Time |
{n,} | repeat n Times or more |
{n,m} | repeat n To m Time |
Matching mode
.* Greedy matching
.*? Laziness matches
stay python Lazy matching is commonly used in crawlers
re modular
function
pattern: Regular expressions ;string: character string ;flags: Status bit , Embeddable rules (re.S,re.M etc. )
function | meaning |
---|---|
compile(pattern, flags=0) | Compiling a regular expression returns a regular expression object ( Preload regular expressions ) |
match(pattern, string, flags=0) | Matching strings with regular expressions , Match from the beginning Matching object returned successfully Otherwise return to None |
search(pattern, string, flags=0) | The pattern of the first occurrence of a regular expression in a search string Matching object returned successfully Otherwise return to None |
split(pattern, string, maxsplit=0, flags=0) | Splits a string with a pattern separator specified by a regular expression Returns a list of |
sub(pattern, repl, string, count=0, flags=0) | Replace the pattern matching the regular expression in the original string with the specified string It can be used count Specify the number of replacements |
fullmatch(pattern, string, flags=0) | match Exact match of function ( From the beginning to the end of a string ) edition |
findall(pattern, string, flags=0) | Find all patterns in a string that match a regular expression Returns a list of strings |
finditer(pattern, string, flags=0) | Find all patterns in a string that match a regular expression Returns an iterator , Get the content from the iterator |
purge() | Clear cache of implicitly compiled regular expressions |
re.I / re.IGNORECASE | Ignore case match mark |
re.M / re.MULTILINE | Multiline match mark |
re.S | Single line match mark |
(?P< Group name > Regular ) | The content can be further extracted from the regular matching content alone (p Use capital letters ) |
Example
1.findall() Return to list form
ls=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls)
#['10086','10010']
2.finditer() Return iterator
it=re.fiindall(r'\d+'," My phone number is :10086, His phone number is :10010.")
for i in it:
print(i.group())
#10086
10010
3.search() return match object
ls=re.search(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())
#10086
Only the first result is returned
4.match() Match from the beginning , And search() similar
ls=re.match(r'\d+',"10086, His phone number is :10010.")
print(ls.group())
#10086
ls=re.match(r'\d+'," My phone number is :10086, His phone number is :10010.")
print(ls.group())
# Report errors
5.compile() Preload regular expressions
obj = re.compile(r'\d')
ret = obj.finditer(" My phone number is :10086, My girlfriend's phone number is :10010")
for it in ret:
print(it.group())
ret = obj.findall(" Population 1000000")
print(ret)
#10086 10010
#['1000000']
6.(?P< Group name > Regular )
s="<div class='a'><span id='1'> Zhang San </span></div>
<div class='b'><span id='2'> Li Si </span></div>
<div class='c'><span id='3'> Wang Wu </span></div>
<div class='d' ><span id='4'> garage </span></div>
<div class='e'><span id='5'> Thompson </span></div>"
obj = re.compile(r"<div class='.*?'><span id=' (?P<id>\d+) '>(?P<name>.*?)</span></div>",re.S)
result = obj.finditer(s)
for it in result:
print(it.group( "name"))
print(it.group("id"))
# Zhang San 1
Li Si 2
Wang Wu 3
garage 4
Thompson 5
7.re.S Match the string as a whole
a = "hello123
world"
b = re.findall('hello(.*?)world',a)
c = re.findall('hello(.*?)world',a,re.S)
print ('b = ' , b)
print ('c = ' , c)
# b =[] An empty list
# c =['123']
边栏推荐
- POJ 3237 tree (tree chain splitting)
- 校招期间 准备面试算法岗位 该怎么做?
- EasyExcel的讀寫操作
- 资深电感厂家告诉你电感什么情况会有噪音电感噪音是比较常见的一种电感故障情况,如果使用的电感出现了噪音大家也不用着急,只需要准确查找分析出什么何原因,其实还是有具体的方法来解决的。作为一家拥有18年品牌
- HDU 4391 Paint The Wall 段树(水
- SecureCRT使用提示
- MMAP
- Zhang Lijun: la pénétration de l’incertitude dépend de quatre « invariants»
- leetcode:1755. Sum of subsequences closest to the target value
- Robot framework setting variables
猜你喜欢
MMAP
Matlab | app designer · I used Matlab to make a real-time editor of latex formula
Deeply convinced plan X - network protocol basic DNS
Experienced inductance manufacturers tell you what makes the inductance noisy. Inductance noise is a common inductance fault. If the used inductance makes noise, you don't have to worry. You just need
R language learning notes
Analysis and test of ModbusRTU communication protocol
秋招将临 如何准备算法面试、回答算法面试题
2.2.5 basic sentences of R language drawing
xlrd常见操作
How to prepare for the algorithm interview and answer the algorithm interview questions
随机推荐
matlab绘制hsv色轮图
面试官:并发编程实战会吗?(线程控制操作详解)
Poj3414广泛搜索
Zhang Lijun: la pénétration de l’incertitude dépend de quatre « invariants»
The primary key is set after the table is created, but auto increment is not set
Detailed explanation of memset() function usage
xlrd常见操作
Codeforces 12D ball tree array simulation 3 sorting elements
Alibaba cloud award winning experience: build a highly available system with polardb-x
Oracle检查点队列–实例崩溃恢复原理剖析
有些事情让感情无处安放
Cold violence -- another perspective of objective function setting
让开发效率提升的跨端方案
Golang (1) | from environmental preparation to quick start
冯唐“春风十里不如你”数字藏品,7月8日登录希壤!
Sorting out the problems encountered in MySQL built by pycharm connecting virtual machines
"Grain mall" -- Summary and induction
An exception occurred in Huawei game multimedia calling the room switching method internal system error Reason:90000017
Hysbz 2243 staining (tree chain splitting)
力扣------经营摩天轮的最大利润