当前位置:网站首页>Teach you to deal with JS reverse picture camouflage hand in hand
Teach you to deal with JS reverse picture camouflage hand in hand
2022-07-05 19:04:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 655 Technology sharing
author : Xinganguo
source :AirPython
“
It is necessary to read this article 6 minute .
”Recently, I am updating the content related to the anti crawl series , This one is about the simplest 「 Picture camouflage 」
Image camouflage is in web page elements , Put words 、 The pictures are mixed together for display , This restricts the crawler from directly obtaining web page content
Target audience :
aHR0cHM6Ly93d3cuZ3hyYy5jb20vam9iRGV0YWlsL2Q2NmExNjQxNzc2MjRlNzA4MzU5NWIzMjI1ZWJjMTBi
1 - analysis
Open the page , By analyzing the page, it is found that the phone number in the web page source code is hidden and protected by default
And check the phone number , You must log in through your account first
After logging in , Clicking the view button on the page will call an interface , Then the phone number is completely displayed
https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b
PS: The URL In the address b2147f6a-6ec7-403e-a836-62978992841b You can get it from the web source code , Corresponding to the enterprise one by one
Through the picture below , We found that in the above interface response values 「 tel 」 Fields can be spliced into a picture , The content in the picture is consistent with the telephone number
therefore , We just need to download this picture , utilize OCR It is possible to identify
2 - Realization
Because the text and picture background on the website is very clean , Therefore, no additional training is required to improve the character recognition rate
First , We call the interface to get the one-to-one correspondence of telephone numbers tel Field
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
'Cookie': '***'
}
# Get the corresponding mobile phone number tel Field id( One-to-one correspondence )
def get_tel_id():
# b2147f6a-6ec7-403e-a836-62978992841b Corresponding enterprise , It is also a one-to-one correspondence ( Web source code )
url = "https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b"
payload = {}
resp = requests.request("GET", url, headers=headers, data=payload).json()
tel_id = resp.get("tel")
return tel_id
then , Use the above tel Field composition picture URL Address
Last , You can recognize the characters of the pictures
Here are 2 Ways of planting :
Baidu OCR
pytesseract
2-1 Baidu OCR
First , Install dependency packages
# Install dependency packages
pip3 install baidu-aip
then , Create an application for character recognition , Get applied APP_ID、API_KEY、SECRET_KEY data
Last , Refer to the official documentation and call the following method to identify the image , Get mobile number data
Official document :
https://cloud.baidu.com/doc/OCR/s/wkibizyjk
from aip import AipOcr
def get_phone(tel_id):
"""
Baidu OCR Identify pictures , Get text content
:param tel_id:
:return:
"""
url = f'https://www.**.com/home/Phone/{tel_id}'
APP_ID = '262**'
API_KEY = '1btP8uUSzfDbji**'
SECRET_KEY = 'NGm6NgAM5ajHcksKs0**'
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
result = client.basicGeneralUrl(url)
# {'words_result': [{'words': '0771-672**'}], 'words_result_num': 1, 'log_id': 1527210***}
print(' The recognized mobile phone number is :', result)
2-2 pytesseract
Again , We need to install character recognition first 、 Dependent package for image processing
# Install dependency packages
pip3 install pillow
pip3 install pytesseract
then , According to the picture URL Address get picture byte stream , The use of pytesseract Just recognize the words in the picture
import io
import pytesseract
import requests
from PIL import Image
if __name__ == '__main__':
# Get mobile phone number URL Address
image_url = f'https://www.**.com/home/Phone/{get_tel_id()}'
resp = requests.get(image_url, headers=headers)
# images.content: Get the binary byte stream of the picture
# io.BytesIO(): Operations handle binary data
# Image.open(): Open picture byte stream , Get a picture object
images_c = Image.open(io.BytesIO(resp.content))
# utilize pytesseract Identify the string in the picture , It is the mobile phone number
phone = pytesseract.image_to_string(images_c)
print(f' Contact information : {phone}')
The above is the conventional way to apply image camouflage , We just need to find out the rules of image generation , And then use it OCR To be recognized as text , Finally, they can be assembled together
End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction
Scan purchase
You'd better watch it
边栏推荐
- 泰山OFFICE技术讲座:由行的布局高度,谈绘制高度的高度溢出、高度缩水(全网首发)
- Benefits of automated testing
- 2022全网最全的腾讯后台自动化测试与持续部署实践【万字长文】
- Ant group open source trusted privacy computing framework "argot": open and universal
- Linear table - abstract data type
- c期末复习
- 视频融合云平台EasyCVR增加多级分组,可灵活管理接入设备
- How much does the mlperf list weigh when AI is named?
- Quickly generate IPA package
- What is text mining? "Suggested collection"
猜你喜欢
Interprocess communication (IPC): shared memory
IDEA配置npm启动
How to write good code defensive programming
如何写出好代码 - 防御式编程
How much does the mlperf list weigh when AI is named?
Thoroughly understand why network i/o is blocked?
块编辑器如何选择?印象笔记 Verse、Notion、FlowUs
Technology sharing | common interface protocol analysis
Analysis of postman core functions - parameterization and test report
跨境支付平台 XTransfer 的低代码实践:如何与其他中台融合是核心
随机推荐
Use file and directory properties and properties
UDF implementation of Dameng database
What is text mining? "Suggested collection"
华为让出的高端市场,小米12S靠徕卡能抢到吗?
Windows Oracle open remote connection Windows Server Oracle open remote connection
Taishan Office Technology Lecture: from the layout height of the line, talk about the height overflow and height shrinkage of the drawing height (launched in the whole network)
CDB 实例的启动与关闭
华律网牵手观测云,上线系统全链路可观测平台
Lombok @builder annotation
2022最新中高级Android面试题目,【原理+实战+视频+源码】
Benefits of automated testing
2022最新大厂Android面试真题解析,Android开发必会技术
基于FPGA的超声波测距
R语言可视化散点图(scatter plot)图、为图中的部分数据点添加标签、始终显示所有标签,即使它们有太多重叠、ggrepel包来帮忙
AI金榜题名时,MLPerf榜单的份量究竟有多重?
2022最新Android面试笔试,一个安卓程序员的面试心得
Oracle日期格式转换 to_date,to_char,to_timetamp 相互转换
Reading notes of Clickhouse principle analysis and Application Practice (5)
解决 contents have differences only in line separators
一朵云开启智慧交通新未来