当前位置:网站首页>基于百度OCR的网站验证码在线识别
基于百度OCR的网站验证码在线识别
2022-08-01 06:40:00 【Anuttarasamyasambodh】
0.问题:
动态识别网站验证码以便后续操作
1.思路:
1.1.获取验证码图片
1.2.使用百度OCR接口在线识别验证码
2.实现:
2.1.获取验证码图片
2.1.1使用webdriver模拟浏览器获取网页
2.1.2根据页面元素中的验证码图片位置属性截取验证码图片并保存
代码实现如下:
def verifycode():
driver = webdriver.Chrome()
driver.set_page_load_timeout(5)
driver.set_script_timeout(5)
try:
driver.get("https://query.ruankao.org.cn/certificate/main")
except Exception as e:
print('time out in search page')
# 1.将注册页面截图保存,这里需要以png结尾,其他图片格式会有warning
driver.save_screenshot("scr_img.png")
# 2.定位到验证码图片元素
#code_ele = driver.find_element_by_id("imgVerifyCode")
code_ele = driver.find_element_by_id("pic")
# 3.元素的位置,结果:{'y': 478, 'x': 565},为图片左上角的位置
print(code_ele.location)
# 4.元素的大小,结果:{'height': 37, 'width': 135}
print(code_ele.size)
# 5.得到将元素的具体位置
x0 = code_ele.location["x"] # 565
y0 = code_ele.location["y"] # 478
x1 = code_ele.size["width"] + x0
y1 = code_ele.size["height"] + y0
img = Image.open("scr_img.png")
image = img.crop((x0, y0, x1, y1)) # 左、上、右、下
image.save("code_img.png") # 将验证码图片保存为code_img.png
或者使用xpath定位到验证码的url然后直接下载验证码图片,实现如下:
def verifycode():
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Referer': 'https://query.ruankao.org.cn/certificate/main',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.74',
'Cookie': 'PHPSESSID=trq1o40; acw_tc=784e8288dgh67f; SERVERID=f7154867dcfa|1618889640|1618887987'
}
# 先用带Cookie的header请求验证码,则服务端存储 _cookis:_verifycode的对应,并返回验证码图片
xpath_str = '//img[@name="pic"]/@src'
base_url = "https://query.ruankao.org.cn/certificate/main"
html_res = requests.get(base_url, headers=headers).text
dom = etree.HTML(html_res)
items = dom.xpath(xpath_str)
if len(items) > 0:
cap_url = items[0]
print(cap_url)
cap = requests.get(cap_url, headers=headers)
with open("cap.png", "wb") as f:
f.write(cap.content)
f.close()
2.2 使用百度OCR接口在线识别验证码
2.2.1 登录百度智能云,创建OCR应用实例,获取APP_ID和APP_KEY
https://cloud.baidu.com/product/ocr_general
根据文档一步一步来肯定能成功,目前有免费额度个人认证 1,000 次/月,企业认证 2,000 次/月,免费测试资源用尽后按照如下价格进行计费
获取到APP_ID和APP_KEY后,就可以调用其接口在线识别了,可以参考技术文档文字识别OCR (baidu.com)
# encoding:utf-8
import requests
import base64
'''
通用文字识别(高精度版)
'''
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic"
# 二进制方式打开图片文件
f = open('[本地文件]', 'rb')
img = base64.b64encode(f.read())
params = {"image":img}
access_token = '[调用鉴权接口获取的t oken]'
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
print (response.json())边栏推荐
- After the image is updated, Glide loading is still the original image problem
- Data organization -- singly linked list of the linear table
- torch
- 奇葩问题 npm install 报错 gyp ERR
- Windows taskbar icon abnormal solution
- 权重等比分配
- 05-SDRAM: Arbitration
- sum of special numbers
- Jupyter shortcuts
- matlab simulink 粒子群优化模糊pid控制的电机泵
猜你喜欢

MVVM project development (commodity management system 1)

Malicious attacks on mobile applications surge by 500%

我说过无数遍了:从来没有一种技术是为灵活组合这个目标而设计的

AspNet.WebApi.Owin 自定义Token请求参数

MATLAB程序设计与应用 2.5 MATLAB运算

Srping中bean的生命周期

Detailed explanation of the crawler framework Scrapy

目标检测概述-上篇

matlab 风速模型 小波滤波

NDK does not contain any platforms problem solving
随机推荐
小白的0基础教程SQL: 什么是SQL 01
matplotlib pyplot
curl (7) Failed connect to localhost8080; Connection refused
爬虫框架 Scrapy 详解
matlab 风速模型 小波滤波
七、MFC序列化机制和序列化类对象
【视觉SLAM十四讲】第一章理论详解
leetcode125 验证回文串
Seleniu: Common operations on elements
dbeaver连接MySQL数据库及错误Connection refusedconnect处理
Selenium: JS operation
uva10825
声音信号处理基频检测和时频分析
Jupyter shortcuts
crypto-js使用
Hunan institute of technology in 2022 ACM training sixth week antithesis
flinkcdc对mysql的date字段类型转化有什么解决思路么
AspNet.WebApi.Owin 自定义Token请求参数
LeetCode Question of the Day (309. Best Time to Buy and Sell Stock with Cooldown)
史上超强最常用SQL语句大全