当前位置:网站首页>Self taught programming series - 1 regular expression
Self taught programming series - 1 regular expression
2022-06-26 09:08:00 【ML_ python_ get√】
Regular expressions
1.1 Do not use regular expressions
def isPhoneNumber(text):
if len(text) !=12:
return False
for i in range(0, 3):
if not text[i].isdecimal():
# Decimal character or not
return False
if text[3] !='-':
return False
for i in range(4,7):
if not text[i].isdecimal():
return False
if text[7] !='-':
return False
for i in range(8,12):
if not text[i].isdecimal():
return False
return True
# print("191-666-1234 is a phone number: ")
# print(isPhoneNumber('191-666-1234'))
# print("bilibili is a phone number :")
# print(isPhoneNumber('bilibili'))
# # Find... In a longer string
# message = "call me at 415-555-1011 tomorrow. 415-555-9999 is my office."
# for i in range(len(message)):
# chunk = message[i:i+12]
# if isPhoneNumber(chunk):
# print('phone number found: ' + chunk)
# print('Done')
1.2 Regular expressions
- \d representative 0-9 The above telephone numbers can be used \d\d\d-\d\d\d-\d\d\d\d To express
- \d{3}-\d{3}-\d{4} among {3} It means match three times , Regular expressions are a way to match , The returned object has properties and methods
- search() Method lookup returns a match object ,match Objects have group Method , Returns the actual matching text
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
# r To get the original string , And escape symbols \ You need to prefix each character with \ More complicated
mo = phoneNumRegex.search('My number is 415-555-4242.')
print("phone number found:" + mo.group())
summary : Regular expression steps
- Import re
- re.compile Create a regex object
- Yes regex Use search Method to pass in the string you want to find , Return to one match object
- Yes match Object use group Method , Returns the actual string
1.3 Group search
- Simple grouping
- Pipe matching
- ?* . Equisign
# groups Print all groups
Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = Regex.search('my number is 123-456-8888.')
print(mo.group(0))
print(mo.group(1))
print(mo.groups())
a,b = mo.groups()
print(a)
print(b)
## There are parentheses in the text , It's the parentheses that lose their meaning in the function
Regex1 = re.compile(r' (\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = Regex1.search('my number is (123) 456-8888.')
print(mo.group(1))
# The pipe matches the first of the words that appear
Regex_hero = re.compile(r'Ironman|Batman')
mo = Regex_hero.search('Ironman and Batman!')
print(mo.group())
mo = Regex_hero.search('Batman and Ironman!')
print(mo.group())
## utilize findall All matches can be found
# Use pipes to match the first occurrence of any word ( The prefix is the same )
Regex_a = re.compile(r'Bat(man|mobile|copter|bat)')
mo = Regex_a.search('Batbat and Batmobile are best!')
print(mo.group())
# (group)? It means that we should group For optional grouping
Regex_chioce = re.compile(r'Bat(wo)?man')
mo = Regex_chioce.search(' I am Batman')
mo1 = Regex_chioce.search('you are Batwoman!')
mo.group()
mo1.group()
# (group)* It means that we should group matching 0 Times or more
Regex_new = re.compile(r'Bat(wo)*man')
mo = Regex_new.search('Batman is my lover!')
print(mo.group())
mo1 = Regex_new.search('my name is Batwowowowowoman!')
print(mo1.group())
mo2 = Regex_new.search('my name is Batman')
print(mo2.group())
# (group)+ It means that we should group matching 1 Times or more
# Regex_add = re.compile(r'Bat(wo)+man')
# mo3 = Regex_add.search('my name is Batman')
# print(mo3.group())
# AttributeError: 'NoneType' object has no attribute 'group'
# {} Specify the number of matches (group){3}3 Time {3,} 3 More than once {,5}5 Below
Regex_ha = re.compile(r'(ha){3}')
mo = Regex_ha.search('hahaha')
print(mo.group())
1.4 Greedy matching and non greedy matching
- python The default regular expression for is greedy , That is, match the longest string ,(group){3,5}? Non greedy matching can be realized
Regex_nogreedy = re.compile(r'(ha){3,5}?')
mo = Regex_nogreedy.search('hahahahahaha')
print(mo.group())
1.5 findall and search
- findall() Find all strings ,search() Find the first string in the string that meets the condition
- findall() Return a list ,search() Return to one match object ; You can also use group matching
Regex_phone = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = Regex_phone.search('Cell:123-456-8888 Work:123-567-9999')
mo1 = Regex_phone.findall('Cell:123-456-8888 Work:123-567-9999')
print(mo.group())
print(mo1)
1.6 Character classification
- \d 0-9 \D except 0-9 Other characters
- \w Word characters include letters 、 Numbers 、 Underline \W Characters other than words
- \s Blank character \S Nonwhite space character
Regex_str = re.compile(r'\d+\s*\w+')
# + Match once or more , A string has multiple numbers and multiple words
mo = Regex_str.findall('12 drummers, 11 pipers, 10 lords, 9 ladies, 8maids, 7swans, 6 geese, 5 rings, 4 birds, 3 hens, 2 doves, 1 partridge')
print(mo)
# Establish your own character classification
Regex_own = re.compile(r'[AEIOUaeiou]')
mo = Regex_own.findall('RoboCop eats baby food. BABY FOOD!')
print(mo)
# [ Custom characters ] Customize [^] Match characters other than custom - Can be connected
Regex_own1 = re.compile(r'[^AEIOUaeiou]')
mo1 = Regex_own1.findall('RoboCop eats baby food. BABY FOOD!')
print(mo1)
1.7 Precise matching
- ^ Insert symbols and $ End symbol
- wildcard .
Regex1 = re.compile(r'^Hello')
mo = Regex1.search('Hello world!')
mo1 = Regex1.search('he said hello!')
print(mo,'\n',mo1)
Regex2 = re.compile(r'^\d+$')
mo2 = Regex2.search('111111111x23333333')
mo3 = Regex2.search('222213232131')
print(mo2,'\n',mo3)
Regex3 = re.compile(r'\d+$')
mo4 = Regex3.search('my age is 26')
print(mo4)
# wildcard . Match all characters except line breaks , But only one character is matched
Regex_at = re.compile(r'.at')
mo = Regex_at.findall('The cat in the hat sat on the flat mat.')
print(mo)
# .* Match any character , For example, when entering name and password
Regex_name = re.compile(r'First name: (.*) Last name: (.*)')
mo = Regex_name.search('First name: AI Last name: Sweigart')
print(mo.group(1))
print(mo.group(2))
# The use of non greedy algorithms
Regex_greed = re.compile(r'<.*>')
mo = Regex_greed.search('<To serve man> for dinner>')
print(mo.group())
Regex_nogreed = re.compile(r'<.*?>')
mo1 = Regex_nogreed.search('<To serve man> for dinner>')
print(mo1.group())
1.8 compile The second parameter
- Ignore blanks re.VERBOSE
- Ignore case re.I
- Wildcard newline find re.DOTALL
# wildcard . To match the newline character, you need to pass in the parameter re.DOTALL
Regex_nonewline = re.compile(r'.*')
mo = Regex_nonewline.findall('Serve the public trust. \nProtect the innocent\nUphold the law')
print(mo)
Regex_newline = re.compile(r'.*',re.DOTALL)
mo1 = Regex_newline.findall('Serve the public trust. \nProtect the innocent\nUphold the law')
print(mo1)
# Ignore case
Regex_cop = re.compile(r'robocop', re.I)
mo=Regex_cop.search('RoboCop is part man, part machine,all cop.')
print(mo.group())
# Let regular expressions ignore whitespace re.VERBOSE
Regex_group = re.compile(r'Agent (\w)\w*',re.VERBOSE)
mo1 = Regex_group.sub(r'\1****', 'A gent Alice gave the secret documents to Agent Bob')
print(mo1)
Regex_group = re.compile(r'Agent (\w)\w*',re.VERBOSE | re.I|re.DOTALL)
# Different values are used for the same parameter , Press bit or
1.9 Alternative text
- sub
# sub Replace matching text
Regex_sub = re.compile(r'Agent \w+')
mo = Regex_sub.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob')
print(mo)
# Replace the matching text with some matching elements, such as the initials of names , Just group , Then incoming \1 \2 \3 that will do
Regex_group = re.compile(r'Agent (\w)\w*')
mo1 = Regex_group.sub(r'\1****', 'Agent Alice gave the secret documents to Agent Bob')
print(mo1)
1.10 Phone number and email address extractor
- Paste the message to the clipboard : Manual or programmed
- Get text from clipboard : Use pyperclip Module copy and paste string , Create two regular expressions to match the phone number and email address respectively
- Find all phone numbers and... In the text E-mail Address : Find all matching results ( Not a one-time match )
- Paste them on the clipboard : Put the matched strings in good format , Put it in a string , For pasting
- If no match is found , Then the message
import pyperclip, re
# Define two regular expressions
phoneRegex = re.compile(r'''( (\d{3} | \(\d{3}\))? (\s | - | \ .)? # Space 、- or . Number (\d{3}) (\s | - | \ .) (\d{4}) (\s*(ext|x|ext.)\s*(\d{2,5}))? # Optional extension number , Here is the third group 8 The first group is the extension number )''', re.VERBOSE)
# First return the group with the largest bracket, and then return a total of 9 Elements
emailRegex = re.compile(r'''( [a-zA-Z0-9._%+-]+ # user name @ [a-zA-Z0-9.-]+ # domain name (\.[a-zA-Z]{2,4}) # .com .cn etc. )''' , re.VERBOSE)
# Continue matching
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
print(groups)
phoneNum = '-'.join([groups[1],groups[3],groups[5]])
if groups[8] !=' ':
phoneNum+= ' x'+groups[8]
matches.append(groups[0])
for groups in emailRegex.findall(text):
matches.append(groups[0])
# Concatenate into a string , Copy to clipboard
if len(matches) >0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard: ')
print('\n'.join(matches))
else:
print('No phone numbers or email address found')
eg: We can use csdn Home page as an example , give the result as follows :
Copied to clipboard:
400-660-0108
999-2021
472464
1900
658
1101
[email protected].net
边栏推荐
- How to set the shelves and windows, and what to pay attention to in the optimization process
- 20220623 Adobe Illustrator入门
- 攔截器與過濾器的實現代碼
- 1.21 study logistic regression and regularization
- 唯品会工作实践 : Json的deserialization应用
- PD快充磁吸移動電源方案
- Code de mise en œuvre de l'intercepteur et du filtre
- ImportError: ERROR: recursion is detected during loading of “cv2“ binary extensions. Check OpenCV in
- Unity WebGL发布无法运行问题
- How to handle the small program tabbar that does not support parameter transfer
猜你喜欢

20220623 Adobe Illustrator入门

Optimize quiver function in MATLAB to draw arrow diagram or vector diagram (1) -matlab development

Graduation thesis management system based on SSM

Phpcms applet plug-in tutorial website officially launched

简析ROS计算图级

phpcms v9商城模块(修复自带支付宝接口bug)

Yolov5进阶之四训练自己的数据集

Yolov5进阶之五GPU环境搭建

Yolov5进阶之三训练环境

微信小程序如何转换成百度小程序
随机推荐
Unity connects to Turing robot
Introduction to common classes on the runtime side
力扣399【除法求值】【并查集】
Nacos注册表结构和海量服务注册与并发读写原理 源码分析
Which software is safer to open an account on
行为树的基本概念及进阶
ThreadLocal
Programming training 7- date conversion problem
基于SSM的电脑商城
Tutorial 1:hello behavioc
Isinstance() function usage
ThreadLocal
百度小程序富文本解析工具bdParse
Phpcms mobile station module implements custom pseudo static settings
phpcms v9商城模块(修复自带支付宝接口bug)
隐藏式列表菜单以及窗口转换在Selenium 中的应用
[IVI] 15.1.2 system stability optimization (lmkd Ⅱ) psi pressure stall information
【程序的编译和预处理】
【MATLAB GUI】 键盘回调中按键识别符查找表
Srv6---is-is extension