一些爬虫相关的签名、验证码破解

Overview

cracking4crawling

一些爬虫相关的签名、验证码破解,目前已有脚本:

说明:

脚本按目标网站、App命名归档,每个脚本一般都是可以单独导入使用(除非调用了额外的用于加解密的js文件),使用方法可阅读文档或参考其中的test函数。

使用方法:

小红书

小红书App接口签名(shield)

shield是小红书App接口主要的签名,由path、params、xy_common_params、xy_platform_info、data拼接并加密生成。原始加密在libshield.so中,已用python复现。

from urllib import parse

from xiaohongshu.shield import get_sign

# 对接口路径、url参数、header中的xy-common-params、xy-platform-info、请求的data进行签名
path = '/api/sns/v4/note/user/posted'

params = parse.urlencode({'user_id': '5eeb209d000000000101d84a'})

xy_common_params = parse.urlencode({})
    
xy_platform_info = parse.urlencode({})

data = parse.urlencode({})

# 生成签名
sign = get_sign(path=path, 
                params=params, 
                xy_common_params=xy_common_params, 
                xy_platform_info=xy_platform_info,
                data=data)
print(sign)

小红书滑块(数美)验证破解

小红书使用数美滑块验证码,验证过程(获取验证码配置>获取验证码>提交验证)在数美的服务器(数美使用organization来识别被验证的网站、App)上进行,完成后将通过的rid提交到小红书的接口。

具体实现细节:

  • 协议更新:数美会定期自动更新js和接口参数字段(接口里所有两个字母组成的字段名都会在更新修改),通过"/ca/v1/conf"接口返回的js路径可以判断协议版本(如"/pr/auto-build/v1.0.1-33/captcha-sdk.min.js",表示协议版本号为33),脚本会加载js,并通过匹配确认字段名,用于后续的接口请求。
  • 验证参数:验证主要需要三个参数:位移比率、时间、轨迹,使用opencv中的matchTemplate函数计算距离,并随机生成相应的轨迹。
  • 调用加密:提交验证的主要参数都需要加密,使用DES加密。
  • 加密过程:"/ca/v1/register"接口会返回一个参数k,使用"sshummei"作为key对它解密,结果为加密参数所需的key,再对参数进行加密。

注:当前的验证参数全部按照小红书App调整,用于其他验证(如小红书Web或其他网站、App),可能需要调整其中参数。

from xiaohongshu.shumei_slide_captcha import get_verify

# 表示小红书
organization = 'eR46sBuqF0fdw7KWFLYa'

# rid是验证过程中响应的标示,r是最后提交验证返回的响应
rid, r = get_verify(organization)

print(rid, r)

# riskLevel为PASS说明验证通过
if r['riskLevel'] == 'PASS':
    # 这里需要向小红书提交rid
    # 具体可抓包查看,接口:/api/sns/v1/system_service/slide_captcha_check
    pass

海南航空

海南航空App接口签名(hnairSign)

签名对象主要是请求的data,取common、data下的全部参数,按字典序排序进行拼接(list、dict类型不参与拼接),结尾加上slat,进行HMAC_SHA1加密生成。

注:"/user/"下的接口加签时,会在拼接的内容前加上token,同时HMAC_SHA1加密会使用服务器返回的secret

from hnair.hna_signature

# 对请求的data进行签名
data = {
    'common': {
        # common的内容
    },
    'data': {
        'adultCount': 1,
        'cabins': ['*'],
        'childCount': 0,
        'depDate': '2020-12-09',
        'dstCode': 'PEK',
        'infantCount': 0,
        'orgCode': 'YYZ',
        'tripType': 1,
        'type': 3
    }
}

# /user/ 路径下的接口需要登录,同时加签要传入token、secret(都由服务器返回)
# token = ''
# secret = ''

# 生成签名
sign = get_sign(data=data)
print(sign)
Owner
XNFA
XNFA
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021
IGLS - Instagram Like Scraper CLI tool

IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos

Shreshth Goyal 5 Oct 29, 2021
Web Scraping COVID 19 Meta Portal with Python

Web-Scraping-COVID-19-Meta-Portal-with-Python - Requests API and Beautiful Soup to scrape real-time COVID statistics from worldometer website and perform data cleaning and visual analysis in Jupyter

Aarif Munwar Jahan 1 Jan 04, 2022
Goblyn is a Python tool focused to enumeration and capture of website files metadata.

Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

Gustavo 46 Nov 22, 2022
茅台抢购最新优化版本,茅台秒杀,优化了抢购协程队列

茅台抢购最新优化版本,茅台秒杀,优化了抢购协程队列

MaoTai 33 Sep 03, 2022
a high-performance, lightweight and human friendly serving engine for scrapy

a high-performance, lightweight and human friendly serving engine for scrapy

Speakol Ads 30 Mar 01, 2022
This is python to scrape overview and reviews of companies from Glassdoor.

Data Scraping for Glassdoor This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of

Houping 5 Jun 23, 2022
Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

Fernando 6 Dec 06, 2022
京东秒杀商品抢购Python脚本

Jd_Seckill 非常感谢原作者 https://github.com/zhou-xiaojun/jd_mask 提供的代码 也非常感谢 https://github.com/wlwwu/jd_maotai 进行的优化 主要功能 登陆京东商城(www.jd.com) cookies登录 (需要自

Andy Zou 1.5k Jan 03, 2023
Discord webhook spammer with proxy support and proxy scraper

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022
Tool to scan for secret files on HTTP servers

snallygaster Finds file leaks and other security problems on HTTP servers. what? snallygaster is a tool that looks for files accessible on web servers

Hanno Böck 2k Dec 28, 2022
Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

Soccer Project 2 Nov 01, 2021
This is a module that I had created along with my friend. It's a basic web scraping module

QuickInfo PYPI link : https://pypi.org/project/quickinfo/ This is the library that you've all been searching for, it's built for developers and allows

OneBit 2 Dec 13, 2021
Find papers by keywords and venues. Then download it automatically

paper finder Find papers by keywords and venues. Then download it automatically. How to use this? Search CLI python search.py -k "knowledge tracing,kn

Jiahao Chen (TabChen) 2 Dec 15, 2022
薅薅乐 - JD 测试脚本

薅薅乐 安裝 使用docker docker一键安装: docker run -d --name jd classmatelin/hhl:latest. 使用 进入容器: docker exec -it jd bash 获取JD_COOKIES: python get_jd_cookies.py,

ClassmateLin 575 Dec 28, 2022
A Web Scraping Program.

Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork

Saurabh G. 2 Dec 14, 2022
Displays market info for the LUNI token on the Terra Blockchain

LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

0 Jan 22, 2022
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

Faeze Ghorbanpour 1 Dec 30, 2021
Scraping Top Repositories for Topics on GitHub,

0.-Webscrapping-using-python Scraping Top Repositories for Topics on GitHub, Web scraping is the process of extracting and parsing data from websites

Dev Aravind D Satprem 2 Mar 18, 2022