当前位置:网站首页>Teach you to deal with JS reverse picture camouflage hand in hand
Teach you to deal with JS reverse picture camouflage hand in hand
2022-07-05 19:04:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 655 Technology sharing
author : Xinganguo
source :AirPython
“
It is necessary to read this article 6 minute .
”Recently, I am updating the content related to the anti crawl series , This one is about the simplest 「 Picture camouflage 」
Image camouflage is in web page elements , Put words 、 The pictures are mixed together for display , This restricts the crawler from directly obtaining web page content
Target audience :
aHR0cHM6Ly93d3cuZ3hyYy5jb20vam9iRGV0YWlsL2Q2NmExNjQxNzc2MjRlNzA4MzU5NWIzMjI1ZWJjMTBi
1 - analysis
Open the page , By analyzing the page, it is found that the phone number in the web page source code is hidden and protected by default
And check the phone number , You must log in through your account first
After logging in , Clicking the view button on the page will call an interface , Then the phone number is completely displayed
https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b
PS: The URL In the address b2147f6a-6ec7-403e-a836-62978992841b You can get it from the web source code , Corresponding to the enterprise one by one
Through the picture below , We found that in the above interface response values 「 tel 」 Fields can be spliced into a picture , The content in the picture is consistent with the telephone number
therefore , We just need to download this picture , utilize OCR It is possible to identify
2 - Realization
Because the text and picture background on the website is very clean , Therefore, no additional training is required to improve the character recognition rate
First , We call the interface to get the one-to-one correspondence of telephone numbers tel Field
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
'Cookie': '***'
}
# Get the corresponding mobile phone number tel Field id( One-to-one correspondence )
def get_tel_id():
# b2147f6a-6ec7-403e-a836-62978992841b Corresponding enterprise , It is also a one-to-one correspondence ( Web source code )
url = "https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b"
payload = {}
resp = requests.request("GET", url, headers=headers, data=payload).json()
tel_id = resp.get("tel")
return tel_id
then , Use the above tel Field composition picture URL Address
Last , You can recognize the characters of the pictures
Here are 2 Ways of planting :
Baidu OCR
pytesseract
2-1 Baidu OCR
First , Install dependency packages
# Install dependency packages
pip3 install baidu-aip
then , Create an application for character recognition , Get applied APP_ID、API_KEY、SECRET_KEY data
Last , Refer to the official documentation and call the following method to identify the image , Get mobile number data
Official document :
https://cloud.baidu.com/doc/OCR/s/wkibizyjk
from aip import AipOcr
def get_phone(tel_id):
"""
Baidu OCR Identify pictures , Get text content
:param tel_id:
:return:
"""
url = f'https://www.**.com/home/Phone/{tel_id}'
APP_ID = '262**'
API_KEY = '1btP8uUSzfDbji**'
SECRET_KEY = 'NGm6NgAM5ajHcksKs0**'
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
result = client.basicGeneralUrl(url)
# {'words_result': [{'words': '0771-672**'}], 'words_result_num': 1, 'log_id': 1527210***}
print(' The recognized mobile phone number is :', result)
2-2 pytesseract
Again , We need to install character recognition first 、 Dependent package for image processing
# Install dependency packages
pip3 install pillow
pip3 install pytesseract
then , According to the picture URL Address get picture byte stream , The use of pytesseract Just recognize the words in the picture
import io
import pytesseract
import requests
from PIL import Image
if __name__ == '__main__':
# Get mobile phone number URL Address
image_url = f'https://www.**.com/home/Phone/{get_tel_id()}'
resp = requests.get(image_url, headers=headers)
# images.content: Get the binary byte stream of the picture
# io.BytesIO(): Operations handle binary data
# Image.open(): Open picture byte stream , Get a picture object
images_c = Image.open(io.BytesIO(resp.content))
# utilize pytesseract Identify the string in the picture , It is the mobile phone number
phone = pytesseract.image_to_string(images_c)
print(f' Contact information : {phone}')
The above is the conventional way to apply image camouflage , We just need to find out the rules of image generation , And then use it OCR To be recognized as text , Finally, they can be assembled together
End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction
Scan purchase
You'd better watch it
边栏推荐
- Technology sharing | interface testing value and system
- 视频自监督学习综述
- 块编辑器如何选择?印象笔记 Verse、Notion、FlowUs
- Shang Silicon Valley Shang preferred project tutorial release
- How to automatically install pythn third-party libraries
- 2022年阿里Android高级面试题分享,2022阿里手淘Android面试题目
- 2022全网最全的腾讯后台自动化测试与持续部署实践【万字长文】
- Tupu software digital twin | visual management system based on BIM Technology
- 一文读懂简单查询代价估算
- EasyCVR授权到期页面无法登录,该如何解决?
猜你喜欢
Talking about fake demand from takeout order
Ant group open source trusted privacy computing framework "argot": open and universal
Summary of six points of MySQL optimization
Postman核心功能解析 —— 参数化和测试报告
Idea configuring NPM startup
Rse2020/ cloud detection: accurate cloud detection of high-resolution remote sensing images based on weak supervision and deep learning
Cf:b. almost Terry matrix [symmetry + finding rules + structure + I am structural garbage]
Redhat7.4 configure Yum software warehouse (rhel7.4)
在线协作产品哪家强?微软 Loop 、Notion、FlowUs
Analysis of postman core functions - parameterization and test report
随机推荐
ICML2022 | 长尾识别中分布外检测的部分和非对称对比学习
ROS installation error sudo: rosdep: command not found
Cf:b. almost Terry matrix [symmetry + finding rules + structure + I am structural garbage]
Technology sharing | common interface protocol analysis
All you want to know about clothing ERP is here
技术分享 | 接口测试价值与体系
Is the performance evaluation of suppliers in the fastener industry cumbersome? Choose the right tool to easily counter attack!
Startup and shutdown of CDB instances
What are the cache interfaces of nailing open platform applet API?
2022 latest Android interview written examination, an Android programmer's interview experience
golang通过指针for...range实现切片中元素的值的更改
China law network joins hands to observe the cloud, and the online system is a full link observable platform
进程间通信(IPC):共享内存
Web3.0时代来了,看天翼云存储资源盘活系统如何赋能新基建(下)
Emqx 5.0 officially released: a single cluster supports 100million mqtt connections
AI金榜题名时,MLPerf榜单的份量究竟有多重?
What is text mining? "Suggested collection"
Video fusion cloud platform easycvr adds multi-level grouping, which can flexibly manage access devices
Idea configuring NPM startup
XML basic knowledge concept