当前位置:网站首页>Notes on Python cookbook 3rd (2.4): string matching and searching
Notes on Python cookbook 3rd (2.4): string matching and searching
2020-11-10 10:46:00 【Giant ship】
String matching and searching
problem
You want to match or search for text in a specific pattern
solution
If you want to match a literal string , So you usually just need to call the basic string method , such as str.find() , str.endswith() , str.startswith() Or something like that :
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> # Exact match
>>> text == 'yeah'
False
>>> # Match at start or end
>>> text.startswith('yeah')
True
>>> text.endswith('no')
False
>>> # Search for the location of the first occurrence
>>> text.find('no')
10
>>>
For complex matching, regular expressions and re modular . To explain the fundamentals of regular expressions , Suppose you want to match a date string in numeric format, such as 11/27/2012 , You can do that :
>>> text1 = '11/27/2012'
>>> text2 = 'Nov 27, 2012'
>>>
>>> import re
>>> # Simple matching: \d+ means match one or more digits
>>> if re.match(r'\d+/\d+/\d+', text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if re.match(r'\d+/\d+/\d+', text2):
... print('yes')
... else:
... print('no')
...
no
>>>
If you want to use the same pattern to do multiple matches , You should precompile pattern strings into pattern objects first . such as :
>>> datepat = re.compile(r'\d+/\d+/\d+')
>>> if datepat.match(text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if datepat.match(text2):
... print('yes')
... else:
... print('no')
...
no
>>>
match() Always start with a string to match , If you want to find the pattern occurrence location of any part of the string , Use findall() Method to replace . such as :
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
['11/27/2012', '3/13/2013']
>>>
When defining a regular form , Usually, parentheses are used to capture groups . such as :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
>>>
Capturing packets can make later processing easier , Because the content of each group can be extracted separately . such as :
>>> m = datepat.match('11/27/2012')
>>> m
<_sre.SRE_Match object at 0x1005d2750>
>>> # Extract the contents of each group
>>> m.group(0)
'11/27/2012'
>>> m.group(1)
'11'
>>> m.group(2)
'27'
>>> m.group(3)
'2012'
>>> m.groups()
('11', '27', '2012')
>>> month, day, year = m.groups()
>>>
>>> # Find all matches (notice splitting into tuples)
>>> text
'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>> for month, day, year in datepat.findall(text):
... print('{}-{}-{}'.format(year, month, day))
...
2012-11-27
2013-3-13
>>>
findall() Method searches for text and returns all matches in the form of a list . If you want to return a match iteratively , have access to finditer() Instead of , such as :
>>> for m in datepat.finditer(text):
... print(m.groups())
...
('11', '27', '2012')
('3', '13', '2013')
>>>
Discuss
This paper describes the use of re Module to match and search text the most basic method . The core step is to use re.compile() Compiling regular expression strings , And then use match() , findall() perhaps finditer() Other methods .
When writing regular strings , A relatively common practice is to use raw strings such as r'(\d+)/(\d +)/(\d+)' . This string will not parse the backslash , This is useful in regular expressions . If not , You have to use two backslashes , similar '(\d+)/(\d+)/(\d+)' .
>>> m = datepat.match('11/27/2012abcdef')
>>> m
<_sre.SRE_Match object at 0x1005d27e8>
>>> m.group()
'11/27/2012'
>>>
If you want to match exactly , Make sure that your regular expression is $ ending , Like this :
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)$')
>>> datepat.match('11/27/2012abcdef')
>>> datepat.match('11/27/2012')
<_sre.SRE_Match object at 0x1005d2750>
>>>
Last , If you just do a simple text match / Search operation , You can skip the compilation part , Use it directly re Module level functions . such as :
>>> re.findall(r'(\d+)/(\d+)/(\d+)', text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>>
But here's the thing , If you're going to do a lot of matching and searching , It's best to compile regular expressions first , And then reuse it . Module level functions cache the most recently compiled schema , So it doesn't cost too much performance , But if you use precompiled mode , You'll reduce the search and some additional processing overhead .
版权声明
本文为[Giant ship]所创,转载请带上原文链接,感谢
边栏推荐
- Design mode (8) -- command mode
- Centos7 Rsync + crontab scheduled backup
- Bartender2021 realizes secure remote label printing, new year-end release
- csdn bug10:待加
- The unscrupulous merchants increase the price of mate40, and Xiaomi is expected to capture more market in the high-end mobile phone market
- Centos7 rsync+crontab 定时备份
- getIServiceManager() 源码分析
- CSDN bug3: to be added
- 世界上最伟大的10个公式,其中一个人尽皆知
- 【技术教程】Visual Studio 2017自建WebRTC中peerconnection_client程序报LNK2019 无法解析的外部符号错误
猜你喜欢

Key layout of the Central Government: in the next five years, self-reliance and self-improvement of science and technology will be the priority, and these industries will be named

express -- 学习笔记(慕课)

LeetCode:数组(一)

What does the mremote variable in servicemanagerproxy refer to?

Swoole 如何使用 Xdebug 进行单步调试

ASP.NET Core框架揭秘[博文汇总

csdn bug6:待加

Summary of basic concepts and common operations of elasticsearch cluster (recommended Collection)

注册滴滴加不上车怎么办?要怎么处理?

2020-11-07
随机推荐
Centos7 local source Yum configuration
GNU assembly basic mathematical equations multiplication
gnu汇编-基本数学方程-乘法
图-无向图
One accidentally drew 24 diagrams to analyze the network application layer protocol!
CSDN bug10: to be added
jsliang 求职系列 - 09 - 手写浅拷贝和深拷贝
[论文阅读笔记] Large-Scale Heterogeneous Feature Embedding
一不小心画了 24 张图剖析计网应用层协议!
Taulia推出国际支付条款数据库
The use of Python PIP command
CSDN bug9: to be added
[paper reading notes] network embedding with attribute refinement
Hystrix 如何解决 ThreadLocal 信息丢失
Taulia launches international payment terms database
《Python Cookbook 3rd》笔记(2.4):字符串匹配和搜索
Api: tiktok: Video Review List
ElasticSearch 集群基本概念及常用操作汇总(建议收藏)
拼多多版滴滴,花小猪还能“香”多久?
ASP.NET Core框架揭秘[博文汇总-持续更新]