当前位置:网站首页>Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)

Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)

2022-06-27 06:06:00 lufei0920

Crawler learning 5— Anti crawl identification image verification code

name Environment version explain
ddddocrlinux System installation ;python3 edition :3.6.8; command :python3 -m pip install ddddocr; Installed version :ddddocr-1.4.3/usr/local/lib/python3.6/site-packages/ddddocr-1.4.3-py3.6.egg/ddddocr/init.py Note the item description in , The recognition effect is good ; See the picture below :  Insert picture description here
pytesseractlinux System installation ;python3 edition :3.6.8; Need to install tesseract Recognition effect is generally not recommended

One 、 utilize ddddocr Example of identification picture verification code

 First installation ddddocr modular :python3 -m pip install ddddocr
	 The installation process is rather tortuous , Always reporting mistakes , Later, it was installed separately according to the associated modules that reported errors , Installation completed .

1、 Sample code

from selenium import webdriver
import time
from PIL import Image,ImageEnhance

import ddddocr

ocr = ddddocr.DdddOcr()   

url = " Page to visit "
options = webdriver.ChromeOptions()
options.add_argument("--headless")  #  Turn on interface free mode 
options.add_argument('--no-sandbox')
options.add_argument("--disable-gpu")
options.add_argument('--disable-dev-shm-usage')  # linux The above four items need to be set on the .
driver = webdriver.Chrome(chrome_options=options,executable_path='/usr/bin/chromedriver')

driver.get(url)  #  request Url
driver.maximize_window()    #  Full screen display 
driver.save_screenshot('m3.png')    #  Screenshot of the entire page , And save as a picture 
location = driver.find_element_by_xpath('//*[@id="login"]/div[5]/span')   #  Get the coordinates of the verification code area 
# print(location.location)
size = location.size   #  Coordinate size 
# print(size)
rangle = (int(location.location['x']),int(location.location['y']),int(location.location['x'] + size['width']),int(location.location['y'] + size['height']))   #  Get the coordinate size of the verification code picture 
i = Image.open('m3.png')   #  Open the saved picture by image 
imgry=i.crop(rangle)    #  Intercept verification code area 
imgry.save('getVerifyCode1.png')   #  Save captcha image 
im=Image.open('getVerifyCode1.png')   #  Open the newly intercepted verification code image again 
sharpness =ImageEnhance.Contrast(im)     # Contrast enhancement , It is easier to identify the verification code in the picture 
#
sharp_img = sharpness.enhance(2.0)
#
sharp_img.save("newVerifyCode1.png")    #  Save the optimized captcha image 


with open('newVerifyCode1.png', 'rb') as f:
    img_bytes = f.read()   #  Read the picture 
res = ocr.classification(img_bytes)   #  Get the characters in the picture 
print(res)

2、 Code demonstration results

 Insert picture description here
 Insert picture description here
Prove that the obtained verification code information is the same as that in the picture .

Two 、pytesseract Implementation verification code

1、 install pytesseract

python3 -m pip install pytesseract

2、 install tesseract

 See... For installation details :https://blog.csdn.net/weixin_44575268/article/details/117258508

3、 Code example

from selenium import webdriver
import time
from PIL import Image,ImageEnhance

import pytesseract

tesseract_cmd = r'/usr/local/bin/tesseract'
pytesseract.pytesseract.tesseract_cmd =tesseract_cmd
url = " Page to visit "
options = webdriver.ChromeOptions()
options.add_argument("--headless")  #  Turn on interface free mode 
options.add_argument('--no-sandbox')
options.add_argument("--disable-gpu")
options.add_argument('--disable-dev-shm-usage')  # linux The above four items need to be set on the .
driver = webdriver.Chrome(chrome_options=options,executable_path='/usr/bin/chromedriver')

driver.get(url)  #  request Url
driver.maximize_window()    #  Full screen display 
driver.save_screenshot('m3.png')    #  Screenshot of the entire page , And save as a picture 
location = driver.find_element_by_xpath('//*[@id="login"]/div[5]/span')   #  Get the coordinates of the verification code area 
# print(location.location)
size = location.size   #  Coordinate size 
# print(size)
rangle = (int(location.location['x']),int(location.location['y']),int(location.location['x'] + size['width']),int(location.location['y'] + size['height']))   #  Get the coordinate size of the verification code picture 
i = Image.open('m3.png')   #  Open the saved picture by image 
imgry=i.crop(rangle)    #  Intercept verification code area 
imgry.save('getVerifyCode1.png')   #  Save captcha image 
im=Image.open('getVerifyCode1.png')   #  Open the newly intercepted verification code image again 
sharpness =ImageEnhance.Contrast(im)     # Contrast enhancement , It is easier to identify the verification code in the picture 
#
sharp_img = sharpness.enhance(2.0)
#
sharp_img.save("newVerifyCode1.png")    #  Save the optimized captcha image 
#
 newVerify = Image.open('newVerifyCode1.png')
#
 mm = pytesseract.image_to_string(newVerify,'eng')
 print(mm)

4、 Sample results

picture :
 Insert picture description here
result :
 Insert picture description here
Unrecognized .

原网站

版权声明
本文为[lufei0920]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/178/202206270548360526.html