当前位置:网站首页>Notes on Python cookbook 3rd (2.4): string matching and searching
Notes on Python cookbook 3rd (2.4): string matching and searching
2020-11-10 10:46:00 【Giant ship】
String matching and searching
problem
You want to match or search for text in a specific pattern
solution
If you want to match a literal string , So you usually just need to call the basic string method , such as str.find() , str.endswith() , str.startswith() Or something like that :
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> # Exact match
>>> text == 'yeah'
False
>>> # Match at start or end
>>> text.startswith('yeah')
True
>>> text.endswith('no')
False
>>> # Search for the location of the first occurrence
>>> text.find('no')
10
>>>
For complex matching, regular expressions and re modular . To explain the fundamentals of regular expressions , Suppose you want to match a date string in numeric format, such as 11/27/2012 , You can do that :
>>> text1 = '11/27/2012'
>>> text2 = 'Nov 27, 2012'
>>>
>>> import re
>>> # Simple matching: \d+ means match one or more digits
>>> if re.match(r'\d+/\d+/\d+', text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if re.match(r'\d+/\d+/\d+', text2):
... print('yes')
... else:
... print('no')
...
no
>>>
If you want to use the same pattern to do multiple matches , You should precompile pattern strings into pattern objects first . such as :
>>> datepat = re.compile(r'\d+/\d+/\d+')
>>> if datepat.match(text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if datepat.match(text2):
... print('yes')
... else:
... print('no')
...
no
>>>
match() Always start with a string to match , If you want to find the pattern occurrence location of any part of the string , Use findall() Method to replace . such as :
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
['11/27/2012', '3/13/2013']
>>>
When defining a regular form , Usually, parentheses are used to capture groups . such as :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
>>>
Capturing packets can make later processing easier , Because the content of each group can be extracted separately . such as :
>>> m = datepat.match('11/27/2012')
>>> m
<_sre.SRE_Match object at 0x1005d2750>
>>> # Extract the contents of each group
>>> m.group(0)
'11/27/2012'
>>> m.group(1)
'11'
>>> m.group(2)
'27'
>>> m.group(3)
'2012'
>>> m.groups()
('11', '27', '2012')
>>> month, day, year = m.groups()
>>>
>>> # Find all matches (notice splitting into tuples)
>>> text
'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>> for month, day, year in datepat.findall(text):
... print('{}-{}-{}'.format(year, month, day))
...
2012-11-27
2013-3-13
>>>
findall() Method searches for text and returns all matches in the form of a list . If you want to return a match iteratively , have access to finditer() Instead of , such as :
>>> for m in datepat.finditer(text):
... print(m.groups())
...
('11', '27', '2012')
('3', '13', '2013')
>>>
Discuss
This paper describes the use of re Module to match and search text the most basic method . The core step is to use re.compile() Compiling regular expression strings , And then use match() , findall() perhaps finditer() Other methods .
When writing regular strings , A relatively common practice is to use raw strings such as r'(\d+)/(\d +)/(\d+)' . This string will not parse the backslash , This is useful in regular expressions . If not , You have to use two backslashes , similar '(\d+)/(\d+)/(\d+)' .
>>> m = datepat.match('11/27/2012abcdef')
>>> m
<_sre.SRE_Match object at 0x1005d27e8>
>>> m.group()
'11/27/2012'
>>>
If you want to match exactly , Make sure that your regular expression is $ ending , Like this :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)$')
>>> datepat.match('11/27/2012abcdef')
>>> datepat.match('11/27/2012')
<_sre.SRE_Match object at 0x1005d2750>
>>>
Last , If you just do a simple text match / Search operation , You can skip the compilation part , Use it directly re Module level functions . such as :
>>> re.findall(r'(\d+)/(\d+)/(\d+)', text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>>
But here's the thing , If you're going to do a lot of matching and searching , It's best to compile regular expressions first , And then reuse it . Module level functions cache the most recently compiled schema , So it doesn't cost too much performance , But if you use precompiled mode , You'll reduce the search and some additional processing overhead .
版权声明
本文为[Giant ship]所创,转载请带上原文链接,感谢
边栏推荐
- New feature of ios14 -- development and practice of widgetkit
- File初相识
- Api: tiktok: Video Review List
- Multibank group announced record financial results with gross profit of $94 million in the first three quarters of 2020
- 如何更好地理解中间件和洋葱模型
- 《Python Cookbook 3rd》笔记(2.4):字符串匹配和搜索
- Design mode (8) -- command mode
- 【操作教程 】国标GB28181协议安防视频平台EasyGBS订阅功能介绍及开启步骤
- 【技术教程】C#控制台调用FFMPEG推MP4视频文件至流媒体开源服务平台EasyDarwin过程
- csdn bug8:待加
猜你喜欢
To speed up the process of forming a global partnership between lifech and Alibaba Group
LeetCode:数组(一)
C++ STL容器篇
[technical course] peerconnection in webrtc self built by visual studio 2017_ The client program reported an external symbol error that LNK2019 could not resolve
ASP.NET Core framework revealed
Hystrix 如何解决 ThreadLocal 信息丢失
csdn bug11:待加
LeetCode:二叉树(四)
ElasticSearch 集群基本概念及常用操作汇总(建议收藏)
CentOS7本地源yum配置
随机推荐
《Python Cookbook 3rd》笔记(2.4):字符串匹配和搜索
File初相识
他把闲鱼APP长列表流畅度翻了倍
《Python Cookbook 3rd》笔记(2.3):用Shell通配符匹配字符串
The high pass snapdragon 875 has won the title of Android processor, but the external 5g baseband has become its disadvantage
I have a crossed translation tool in my hand!
Looking for a small immutable dictionary with better performance
MFC界面开发帮助文档——BCG如何在工具栏上放置控件
子线程调用invalidate()产生“Only the original thread that created a view hierarchy can touch its views.”原因分析
[paper reading notes] network embedding with attribute refinement
C + + standard library header file
Getiservicemanager () source code analysis
拼多多版滴滴,花小猪还能“香”多久?
一不小心画了 24 张图剖析计网应用层协议!
Centos7 local source Yum configuration
Bartender2021实现安全远程标签打印,年终全新发布
Farfetch、阿里巴巴集团和历峰集团结成全球合作伙伴关系,将加速奢侈品行业数字化进程
Use Python to guess which numbers in the set are added to get the sum of a number
【iOS】苹果登录Sign in with Apple
上线1周,B.Protocal已有7000ETH资产!