当前位置:网站首页>让你的正则表达式可读性提高一百倍
让你的正则表达式可读性提高一百倍
2022-07-29 04:48:00 【Python数据之道】
作者:kingname
来源:未闻 Code
正则表达式这个东西,强大是强大,但写出来跟个表情符号一样。自己写的表达式,过一个月来看,自己都不记得是什么意思了。比如下面这个:
pattern = r"((?:\(\s*)?[A-Z]*H\d+[a-z]*(?:\s*\+\s*[A-Z]*H\d+[a-z]*)*(?:\s*[\):+])?)(.*?)(?=(?:\(\s*)?[A-Z]*H\d+[a-z]*(?:\s*\+\s*[A-Z]*H\d+[a-z]*)*(?:\s*[\):+])?(?![^\w\s])|$)"
有没有什么办法提高正则表达式的可读性呢?我们知道,提高代码可读性的方法之一就是写注释,那么正则表达式能不能写注释呢?
例如对于下面这个句子:
msg = '我叫青南,我的密码是:123kingname456,请注意保密。'
我要提取其中的密码123kingname456
,那么我的正则表达式可能是这样的:
pattern = ':(.*?),'
我能不能把它写成这样:
pattern = '''
: # 开始标志
(.*?) #从开始标志的下一个字符开始的任意字符
, #遇到英文逗号就停止
'''
这样写就清晰多了,每个部分是什么作用全都清清楚楚。
但显然直接使用肯定什么都提取不到,如下图所示:

但我今天在逛 Python 正则表达式文档[1]的时候,发现了一个好东西:

使用它,可以让你的正则表达式拥有注释,如下图所示:

re.VERBOSE
也可以简称为re.X
,如下图所示:

本文最开头的复杂正则表达式,使用了注释以后,就会变得更可读:
pattern = r"""
( # code (capture)
# BEGIN multicode
(?: \( \s* )? # maybe open paren and maybe space
# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
(?: # maybe followed by other codes,
\s* \+ \s* # ... plus-separated
# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
)*
(?: \s* [\):+] )? # maybe space and maybe close paren or colon or plus
# END multicode
)
( .*? ) # message (capture): everything ...
(?= # ... up to (but excluding) ...
# ... the next code
# BEGIN multicode
(?: \( \s* )? # maybe open paren and maybe space
# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
(?: # maybe followed by other codes,
\s* \+ \s* # ... plus-separated
# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
)*
(?: \s* [\):+] )? # maybe space and maybe close paren or colon or plus
# END multicode
# (but not when followed by punctuation)
(?! [^\w\s] )
# ... or the end
| $
)
"""
参考资料
[1]
正则表达式文档: https://docs.python.org/3/library/re.html#re.VERBOSE
-------- End --------

精选内容


边栏推荐
- 安装spinning up教程里与mujoco对应的gym,报错mjpro150
- Go面向并发的内存模型
- Christmas tree web page and Christmas tree application
- Auto.js脚本开发入门
- [express connection to MySQL database]
- 钉钉对话框文子转换成图片 不能复制粘贴到文档上
- Pycharm reports an error when connecting to the virtual machine database
- Pyscript cannot import package
- (heap sort) heap sort is super detailed, I don't believe you can't (C language code implementation)
- pulsar起client客户端时(client,producer,consumer)各个配置
猜你喜欢
学术 | [LaTex]超详细Texlive2022+Tex Studio下载安装配置
Recommendation system of online education
Hengxing Ketong invites you to the 24th China expressway informatization conference and technical product exhibition in Hunan
Classes and objects (I)
用 ZEGO Avatar 做一个虚拟人|虚拟主播直播解决方案
After the spinning up installation is completed, use the tutorial to test whether it is successful. There are library "Glu" not found and 'from pyglet.gl import * error solutions
Laya中的A星寻路
There are objections and puzzles about joinpoint in afterreturning notice (I hope someone will leave a message)
Deep analysis of data storage in memory (Advanced C language)
央企建筑企业数字化转型核心特征是什么?
随机推荐
C语言之基础语法
常见的限流方式
[c language] PTA 7-50 output Fahrenheit Celsius temperature conversion table
Download addresses of various versions of MySQL and multi version coexistence installation
SSM integration, addition, deletion, modification and query
un7.28:redis客户端常用命令。
def fasterrcnn_ resnet50_ FPN () instance test
[c language] PTA 7-63 falling ball
Classes and objects (II)
Mysql各版本下载地址及多版本共存安装
Make a virtual human with zego avatar | virtual anchor live broadcast solution
PHP判断用户是否已经登录,如果登录则显示首页,如果未登录则进入登录页面或注册页面
ios面试准备 - 网络篇
ios面试准备 - objective-c篇
MySQL - clustered index and secondary index
DASCTF2022.07赋能赛
Tower of Hanoi classic recursion problem (C language implementation)
Common current limiting methods
Data Lake: spark, a distributed open source processing engine
iOS面试准备 - 其他篇