当前位置:网站首页>利用云打码来破解登录遇到验证码的问题
利用云打码来破解登录遇到验证码的问题
2022-07-29 05:23:00 【赵颂@】
云打码地址:
爬取的目标网站,是一个古诗文网站https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx
云达码是一款打码平台,
云达码使用流程
- 注册:普通和开发者用户
- 登录:
1 . 普通用户登录:查询该用户是否还有剩余的提分
2 .开发者用户的登录:
- 创建一个软件,:我的文档–》添加先软件–》录入软件名称–》提交(appid和appkey)
- 下载示例代码,:开发文档–》点击下载:云打码接口DLL–》PythonHTTP示例下载
代码块中自定义函数login里面的email和pwd是登录的账号和密码,这个可以登录自己的用户账号,来进行爬取,注意::每爬取一次就会扣除一定的提分,扣除多少 都有一定的规定 :http://www.yundama.com/price.html这是提分价格表的链接,可以去了解下!
对了!还有一点就是,再登录的时候一定要创建会话,否则代码写的再No Problem**!都是单,一定要携带 cookie去访问,
# -*- coding = utf-8 -*-
#@time :2020/5/18 13:39
#Author :Song
#@file 古诗文网验证码识别.py
#@Software: PyCharm
#下载验证码图片到本地
import requests
from fake_useragent import UserAgent
from lxml import etree
from webspider.day18.verification_code import get_code
def indexHTML(s):
url ="https://so.gushiwen.org/user/login.aspx?from="
r =s.get(url=url,headers={
"User-Agent":UserAgent().chrome})
return r.text
def download_image(html,s):
tree = etree.HTML(html)
#获取验证码图片
image_src =tree.xpath('//*[@id="imgCode"]/@src')[0]
#拼接完整的image url
image_url ="https://so.gushiwen.org" + image_src
r =s.get(url=image_url,headers={
"User-Agent":UserAgent().random})
with open("yzm.png","wb")as fp:
fp.write(r.content)
code =get_code("yzm.png",1004)
print(code)
# print(code)
#指令解析
viewstate = __VIEWSTATE = tree.xpath('//*[@id="aspnetForm"]/div[1]/input/@value')[0]
viewstategenerator = __VIEWSTATEGENERATOR =tree.xpath('//*[@id="aspnetForm"]/div[2]/input/@value')[0]
# print(viewstategenerator)
# print(viewstate)
return code,viewstate,viewstategenerator
def login(code,viewstate,viewstategenerator,s):
post_url="https://so.gushiwen.org/user/login.aspx?from="
formdata={
"__VIEWSTATE":viewstate,
"__VIEWSTATEGENERATOR":viewstategenerator,
"from":"",
"email":"**********",
"pwd":"**********",
"wasd":"",
"code":code,
"denglu":"登录",
}
r =s.post(url=post_url,headers={
"User-Agent":UserAgent().chrome},data=formdata)
# print(r.text)
with open("gs.html","w",encoding="utf8")as fp:
fp.write(r.text)
def main():
#创建会话,来进行登录
s =requests.Session()
#古诗文网页没登陆之前的页面,获取页面信息,来下载验证码图片
html = indexHTML(s)
#下载验证码,
code,viewstate,viewstategenerator = download_image(html,s)
#登录
login(code,viewstate,viewstategenerator,s)
if __name__ == '__main__':
main()
边栏推荐
猜你喜欢

Am model in NLP field

LoRa开启物联网新时代-ASR6500S、ASR6501/6502、ASR6505、ASR6601

Reading papers on false news detection (5): a semi supervised learning method for fake news detection in social media

JUC并发知识点

华为云14天鸿蒙设备开发-Day7WIFI功能开发

Hal library learning notes-11 I2C

Reading papers on fake news detection (2): semi supervised learning and graph neural networks for fake news detection

Huawei cloud 14 day Hongmeng device development -day7wifi function development

HAL库学习笔记-12 SPI

HAL库学习笔记- 9 DMA
随机推荐
基于51单片机ADC0808的proteus仿真
Based on stc51: schematic diagram and source code of four axis flight control open source project (entry-level DIY)
基于F407ZGT6的WS2812B彩灯驱动
【软件工程之美 - 专栏笔记】16 | 怎样才能写好项目文档?
Torch. NN. Parameter() function understanding
STM32FF030 替代国产单片机——DP32G030
125KHz唤醒功能2.4GHz单发射芯片-Si24R2H
八大排序----------------冒泡排序
【软件工程之美 - 专栏笔记】“一问一答”第3期 | 18个软件开发常见问题解决策略
EPS32+Platform+Arduino 跑马灯
数学建模心得
SQLyog 安装和配置教程
How to use the pre training language model
Ml8 self study notes
2022 spring move - core technology FPGA development post pen test question (original question and experience)
抽象类以及接口
HAL学习笔记 - 7 定时器之高级定时器
简洁代码实现pdf转word文档
PHY6252是一款超低功耗物联网蓝牙无线通信芯片
Hal library learning notes - 8 use of serial communication