当前位置:网站首页>Teach you to deal with JS reverse picture camouflage hand in hand
Teach you to deal with JS reverse picture camouflage hand in hand
2022-07-05 19:04:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 655 Technology sharing
author : Xinganguo
source :AirPython
“
It is necessary to read this article 6 minute .
”Recently, I am updating the content related to the anti crawl series , This one is about the simplest 「 Picture camouflage 」
Image camouflage is in web page elements , Put words 、 The pictures are mixed together for display , This restricts the crawler from directly obtaining web page content
Target audience :
aHR0cHM6Ly93d3cuZ3hyYy5jb20vam9iRGV0YWlsL2Q2NmExNjQxNzc2MjRlNzA4MzU5NWIzMjI1ZWJjMTBi
1 - analysis
Open the page , By analyzing the page, it is found that the phone number in the web page source code is hidden and protected by default
And check the phone number , You must log in through your account first
After logging in , Clicking the view button on the page will call an interface , Then the phone number is completely displayed
https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b
PS: The URL In the address b2147f6a-6ec7-403e-a836-62978992841b You can get it from the web source code , Corresponding to the enterprise one by one
Through the picture below , We found that in the above interface response values 「 tel 」 Fields can be spliced into a picture , The content in the picture is consistent with the telephone number
therefore , We just need to download this picture , utilize OCR It is possible to identify
2 - Realization
Because the text and picture background on the website is very clean , Therefore, no additional training is required to improve the character recognition rate
First , We call the interface to get the one-to-one correspondence of telephone numbers tel Field
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
'Cookie': '***'
}
# Get the corresponding mobile phone number tel Field id( One-to-one correspondence )
def get_tel_id():
# b2147f6a-6ec7-403e-a836-62978992841b Corresponding enterprise , It is also a one-to-one correspondence ( Web source code )
url = "https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b"
payload = {}
resp = requests.request("GET", url, headers=headers, data=payload).json()
tel_id = resp.get("tel")
return tel_id
then , Use the above tel Field composition picture URL Address
Last , You can recognize the characters of the pictures
Here are 2 Ways of planting :
Baidu OCR
pytesseract
2-1 Baidu OCR
First , Install dependency packages
# Install dependency packages
pip3 install baidu-aip
then , Create an application for character recognition , Get applied APP_ID、API_KEY、SECRET_KEY data
Last , Refer to the official documentation and call the following method to identify the image , Get mobile number data
Official document :
https://cloud.baidu.com/doc/OCR/s/wkibizyjk
from aip import AipOcr
def get_phone(tel_id):
"""
Baidu OCR Identify pictures , Get text content
:param tel_id:
:return:
"""
url = f'https://www.**.com/home/Phone/{tel_id}'
APP_ID = '262**'
API_KEY = '1btP8uUSzfDbji**'
SECRET_KEY = 'NGm6NgAM5ajHcksKs0**'
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
result = client.basicGeneralUrl(url)
# {'words_result': [{'words': '0771-672**'}], 'words_result_num': 1, 'log_id': 1527210***}
print(' The recognized mobile phone number is :', result)
2-2 pytesseract
Again , We need to install character recognition first 、 Dependent package for image processing
# Install dependency packages
pip3 install pillow
pip3 install pytesseract
then , According to the picture URL Address get picture byte stream , The use of pytesseract Just recognize the words in the picture
import io
import pytesseract
import requests
from PIL import Image
if __name__ == '__main__':
# Get mobile phone number URL Address
image_url = f'https://www.**.com/home/Phone/{get_tel_id()}'
resp = requests.get(image_url, headers=headers)
# images.content: Get the binary byte stream of the picture
# io.BytesIO(): Operations handle binary data
# Image.open(): Open picture byte stream , Get a picture object
images_c = Image.open(io.BytesIO(resp.content))
# utilize pytesseract Identify the string in the picture , It is the mobile phone number
phone = pytesseract.image_to_string(images_c)
print(f' Contact information : {phone}')
The above is the conventional way to apply image camouflage , We just need to find out the rules of image generation , And then use it OCR To be recognized as text , Finally, they can be assembled together
End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction
Scan purchase
You'd better watch it
边栏推荐
- 解决 contents have differences only in line separators
- Benefits of automated testing
- Linear table - abstract data type
- Find in MySQL_ in_ Detailed explanation of set() function usage
- R语言可视化散点图(scatter plot)图、为图中的部分数据点添加标签、始终显示所有标签,即使它们有太多重叠、ggrepel包来帮忙
- 潘多拉 IOT 开发板学习(HAL 库)—— 实验8 定时器中断实验(学习笔记)
- ROS installation error sudo: rosdep: command not found
- 技术分享 | 接口测试价值与体系
- Windows Oracle open remote connection Windows Server Oracle open remote connection
- XML基础知识概念
猜你喜欢
Blue sky drawing bed Apple quick instructions
cf:B. Almost Ternary Matrix【对称 + 找规律 + 构造 + 我是构造垃圾】
尚硅谷尚优选项目教程发布
Word finds red text word finds color font word finds highlighted formatted text
2022最新中高级Android面试题目,【原理+实战+视频+源码】
android中常见的面试题,2022金九银十Android大厂面试题来袭
在线协作产品哪家强?微软 Loop 、Notion、FlowUs
How to quickly advance automated testing? Listen to the personal feelings of the three bat test engineers
Mysql database indexing tutorial (super detailed)
Tupu software digital twin | visual management system based on BIM Technology
随机推荐
What are the cache interfaces of nailing open platform applet API?
7-1 链表也简单fina
Powerful tool for collection processing
2022全网最全的腾讯后台自动化测试与持续部署实践【万字长文】
Lombok @builder annotation
Solutions contents have differences only in line separators
EMQX 5.0 正式发布:单集群支持 1 亿 MQTT 连接
[detailed explanation of AUTOSAR 14 startup process]
公司破产后,黑石们来了
How to automatically install pythn third-party libraries
Oracle date format conversion to_ date,to_ char,to_ Timestamp mutual conversion
尚硅谷尚优选项目教程发布
集合处理的利器
Go deep into the underlying C source code and explain the core design principles of redis
Chinese postman? Really powerful!
深入底层C源码讲透Redis核心设计原理
#夏日挑战赛# HarmonyOS - 实现消息通知功能
Mathematical modeling of oil pipeline layout MATLAB, mathematical model of oil pipeline layout
达梦数据库udf实现
华为让出的高端市场,小米12S靠徕卡能抢到吗?