当前位置:网站首页>Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)
Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)
2022-06-27 06:06:00 【lufei0920】
Crawler learning 5— Anti crawl identification image verification code
| name | Environment version | explain |
|---|---|---|
| ddddocr | linux System installation ;python3 edition :3.6.8; command :python3 -m pip install ddddocr; Installed version :ddddocr-1.4.3 | /usr/local/lib/python3.6/site-packages/ddddocr-1.4.3-py3.6.egg/ddddocr/init.py Note the item description in , The recognition effect is good ; See the picture below : ![]() |
| pytesseract | linux System installation ;python3 edition :3.6.8; Need to install tesseract | Recognition effect is generally not recommended |
One 、 utilize ddddocr Example of identification picture verification code
First installation ddddocr modular :python3 -m pip install ddddocr
The installation process is rather tortuous , Always reporting mistakes , Later, it was installed separately according to the associated modules that reported errors , Installation completed .
1、 Sample code
from selenium import webdriver
import time
from PIL import Image,ImageEnhance
import ddddocr
ocr = ddddocr.DdddOcr()
url = " Page to visit "
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Turn on interface free mode
options.add_argument('--no-sandbox')
options.add_argument("--disable-gpu")
options.add_argument('--disable-dev-shm-usage') # linux The above four items need to be set on the .
driver = webdriver.Chrome(chrome_options=options,executable_path='/usr/bin/chromedriver')
driver.get(url) # request Url
driver.maximize_window() # Full screen display
driver.save_screenshot('m3.png') # Screenshot of the entire page , And save as a picture
location = driver.find_element_by_xpath('//*[@id="login"]/div[5]/span') # Get the coordinates of the verification code area
# print(location.location)
size = location.size # Coordinate size
# print(size)
rangle = (int(location.location['x']),int(location.location['y']),int(location.location['x'] + size['width']),int(location.location['y'] + size['height'])) # Get the coordinate size of the verification code picture
i = Image.open('m3.png') # Open the saved picture by image
imgry=i.crop(rangle) # Intercept verification code area
imgry.save('getVerifyCode1.png') # Save captcha image
im=Image.open('getVerifyCode1.png') # Open the newly intercepted verification code image again
sharpness =ImageEnhance.Contrast(im) # Contrast enhancement , It is easier to identify the verification code in the picture
#
sharp_img = sharpness.enhance(2.0)
#
sharp_img.save("newVerifyCode1.png") # Save the optimized captcha image
with open('newVerifyCode1.png', 'rb') as f:
img_bytes = f.read() # Read the picture
res = ocr.classification(img_bytes) # Get the characters in the picture
print(res)
2、 Code demonstration results


Prove that the obtained verification code information is the same as that in the picture .
Two 、pytesseract Implementation verification code
1、 install pytesseract
python3 -m pip install pytesseract
2、 install tesseract
See... For installation details :https://blog.csdn.net/weixin_44575268/article/details/117258508
3、 Code example
from selenium import webdriver
import time
from PIL import Image,ImageEnhance
import pytesseract
tesseract_cmd = r'/usr/local/bin/tesseract'
pytesseract.pytesseract.tesseract_cmd =tesseract_cmd
url = " Page to visit "
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Turn on interface free mode
options.add_argument('--no-sandbox')
options.add_argument("--disable-gpu")
options.add_argument('--disable-dev-shm-usage') # linux The above four items need to be set on the .
driver = webdriver.Chrome(chrome_options=options,executable_path='/usr/bin/chromedriver')
driver.get(url) # request Url
driver.maximize_window() # Full screen display
driver.save_screenshot('m3.png') # Screenshot of the entire page , And save as a picture
location = driver.find_element_by_xpath('//*[@id="login"]/div[5]/span') # Get the coordinates of the verification code area
# print(location.location)
size = location.size # Coordinate size
# print(size)
rangle = (int(location.location['x']),int(location.location['y']),int(location.location['x'] + size['width']),int(location.location['y'] + size['height'])) # Get the coordinate size of the verification code picture
i = Image.open('m3.png') # Open the saved picture by image
imgry=i.crop(rangle) # Intercept verification code area
imgry.save('getVerifyCode1.png') # Save captcha image
im=Image.open('getVerifyCode1.png') # Open the newly intercepted verification code image again
sharpness =ImageEnhance.Contrast(im) # Contrast enhancement , It is easier to identify the verification code in the picture
#
sharp_img = sharpness.enhance(2.0)
#
sharp_img.save("newVerifyCode1.png") # Save the optimized captcha image
#
newVerify = Image.open('newVerifyCode1.png')
#
mm = pytesseract.image_to_string(newVerify,'eng')
print(mm)
4、 Sample results
picture :
result :
Unrecognized .
边栏推荐
猜你喜欢

Quick personal site building guide using WordPress
![Navigation [machine learning]](/img/79/8311a409113331e72f650a83351b46.png)
Navigation [machine learning]

Leetcode298 weekly race record

Thinking technology: how to solve the dilemma in work and life?

线程间等待与唤醒机制、单例模式、阻塞队列、定时器

多线程带来的的风险——线程安全

Two position relay hjws-9440

Free SSH and telnet client putty

Leetcode99 week race record

Us camera cloud service scheme: designed for lightweight video production scenes
随机推荐
【QT小记】QT中正则表达式QRegularExpression的基本使用
G1 and ZGC garbage collector
Jump details of item -h5 list, and realize the function of not refreshing when backing up, and refreshing when modifying data (record scroll bar)
[collection] Introduction to basic knowledge of point cloud and functions of point cloud catalyst software
【Cocos Creator 3.5.1】坐标的加法
程序猿学习抖音短视频制作
Senior [Software Test Engineer] learning route and necessary knowledge points
Run opcua protocol demo on raspberry pie 4B to access kubeedge
Two position relay hjws-9440
yaml文件加密
Multithreading basic part2
多线程基础部分Part3
JVM garbage collection mechanism
项目-h5列表跳转详情,实现后退不刷新,修改数据则刷新的功能(记录滚动条)
Add widget on qlistwidgetitem
js实现双向数据绑定
Webrtc series - Nomination and ice of 7-ice supplement for network transmission_ Model
427- binary tree (617. merge binary tree, 700. search in binary search tree, 98. verify binary search tree, 530. minimum absolute difference of binary search tree)
Dev++ environment setting C language keyword display color
爬虫学习5---反反爬之识别图片验证码(ddddocr和pytesseract实测效果)
