当前位置:网站首页>Notes on Python cookbook 3rd (2.4): string matching and searching
Notes on Python cookbook 3rd (2.4): string matching and searching
2020-11-10 10:46:00 【Giant ship】
String matching and searching
problem
You want to match or search for text in a specific pattern
solution
If you want to match a literal string , So you usually just need to call the basic string method , such as str.find() , str.endswith() , str.startswith() Or something like that :
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> # Exact match
>>> text == 'yeah'
False
>>> # Match at start or end
>>> text.startswith('yeah')
True
>>> text.endswith('no')
False
>>> # Search for the location of the first occurrence
>>> text.find('no')
10
>>>
For complex matching, regular expressions and re modular . To explain the fundamentals of regular expressions , Suppose you want to match a date string in numeric format, such as 11/27/2012 , You can do that :
>>> text1 = '11/27/2012'
>>> text2 = 'Nov 27, 2012'
>>>
>>> import re
>>> # Simple matching: \d+ means match one or more digits
>>> if re.match(r'\d+/\d+/\d+', text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if re.match(r'\d+/\d+/\d+', text2):
... print('yes')
... else:
... print('no')
...
no
>>>
If you want to use the same pattern to do multiple matches , You should precompile pattern strings into pattern objects first . such as :
>>> datepat = re.compile(r'\d+/\d+/\d+')
>>> if datepat.match(text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if datepat.match(text2):
... print('yes')
... else:
... print('no')
...
no
>>>
match() Always start with a string to match , If you want to find the pattern occurrence location of any part of the string , Use findall() Method to replace . such as :
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
['11/27/2012', '3/13/2013']
>>>
When defining a regular form , Usually, parentheses are used to capture groups . such as :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
>>>
Capturing packets can make later processing easier , Because the content of each group can be extracted separately . such as :
>>> m = datepat.match('11/27/2012')
>>> m
<_sre.SRE_Match object at 0x1005d2750>
>>> # Extract the contents of each group
>>> m.group(0)
'11/27/2012'
>>> m.group(1)
'11'
>>> m.group(2)
'27'
>>> m.group(3)
'2012'
>>> m.groups()
('11', '27', '2012')
>>> month, day, year = m.groups()
>>>
>>> # Find all matches (notice splitting into tuples)
>>> text
'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>> for month, day, year in datepat.findall(text):
... print('{}-{}-{}'.format(year, month, day))
...
2012-11-27
2013-3-13
>>>
findall() Method searches for text and returns all matches in the form of a list . If you want to return a match iteratively , have access to finditer() Instead of , such as :
>>> for m in datepat.finditer(text):
... print(m.groups())
...
('11', '27', '2012')
('3', '13', '2013')
>>>
Discuss
This paper describes the use of re Module to match and search text the most basic method . The core step is to use re.compile() Compiling regular expression strings , And then use match() , findall() perhaps finditer() Other methods .
When writing regular strings , A relatively common practice is to use raw strings such as r'(\d+)/(\d +)/(\d+)' . This string will not parse the backslash , This is useful in regular expressions . If not , You have to use two backslashes , similar '(\d+)/(\d+)/(\d+)' .
>>> m = datepat.match('11/27/2012abcdef')
>>> m
<_sre.SRE_Match object at 0x1005d27e8>
>>> m.group()
'11/27/2012'
>>>
If you want to match exactly , Make sure that your regular expression is $ ending , Like this :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)$')
>>> datepat.match('11/27/2012abcdef')
>>> datepat.match('11/27/2012')
<_sre.SRE_Match object at 0x1005d2750>
>>>
Last , If you just do a simple text match / Search operation , You can skip the compilation part , Use it directly re Module level functions . such as :
>>> re.findall(r'(\d+)/(\d+)/(\d+)', text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>>
But here's the thing , If you're going to do a lot of matching and searching , It's best to compile regular expressions first , And then reuse it . Module level functions cache the most recently compiled schema , So it doesn't cost too much performance , But if you use precompiled mode , You'll reduce the search and some additional processing overhead .
版权声明
本文为[Giant ship]所创,转载请带上原文链接,感谢
边栏推荐
- 《Python Cookbook 3rd》笔记(2.4):字符串匹配和搜索
- 计算机专业的学生要怎样做才能避免成为低级的码农?
- LeetCode 5561. 获取生成数组中的最大值
- [paper reading notes] rosane, robust and scalable attributed network embedding for sparse networks
- Taulia推出国际支付条款数据库
- Custom annotation! Absolutely is the sharp weapon that programmer installs force!!
- [operation tutorial] introduction and opening steps of easygbs subscription function of national standard gb28181 protocol security video platform
- Do not understand the code, can type can build a station? 1111 yuan gift bag to help you with one stop!
- 【iOS】苹果登录Sign in with Apple
- ElasticSearch 集群基本概念及常用操作汇总(建议收藏)
猜你喜欢
gnu汇编-基本数学方程-乘法
Taulia推出国际支付条款数据库
注册滴滴加不上车怎么办?要怎么处理?
csdn bug6:待加
ASP.NET Core framework revealed
How to better understand middleware and onion model
Network security engineer Demo: original * * controls your server like this! (2)
想花钱速学互联网行业,大概花两三个月的时间,出来好找工作吗
LeetCode 5561. 获取生成数组中的最大值
Centos7 Rsync + crontab scheduled backup
随机推荐
[paper reading notes] large scale heterogeneous feature embedding
设计 API 时通过 POST 获取数据需要注意哪些问题
基于FPGA的MCP4725驱动程序
世界上最伟大的10个公式,其中一个人尽皆知
CSDN bug11: to be added
File初相识
He doubled the fluency of the long list of idle fish app
gnu汇编语言使用内联汇编 扩展asm
Leetcode 1-sum of two numbers
一个 Task 不够,又来一个 ValueTask ,真的学懵了!
高通骁龙875夺安卓处理器桂冠,但外挂5G基带成为它的弊病
ASP.NET Core框架揭秘[博文汇总
2020-11-07
寻找性能更优秀的不可变小字典
STATISTICS STATS 380
CCR coin robot: novel coronavirus pneumonia has accelerated the interest of regulators in CBDC.
.MD语法入门
python pip命令的使用
ASP.NET Core框架揭秘[博文汇总-持续更新]
拼多多版滴滴,花小猪还能“香”多久?