当前位置:网站首页>Teach you to deal with JS reverse picture camouflage hand in hand
Teach you to deal with JS reverse picture camouflage hand in hand
2022-07-05 19:04:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 655 Technology sharing
author : Xinganguo
source :AirPython
“
It is necessary to read this article 6 minute .
”Recently, I am updating the content related to the anti crawl series , This one is about the simplest 「 Picture camouflage 」
Image camouflage is in web page elements , Put words 、 The pictures are mixed together for display , This restricts the crawler from directly obtaining web page content
Target audience :
aHR0cHM6Ly93d3cuZ3hyYy5jb20vam9iRGV0YWlsL2Q2NmExNjQxNzc2MjRlNzA4MzU5NWIzMjI1ZWJjMTBi
1 - analysis
Open the page , By analyzing the page, it is found that the phone number in the web page source code is hidden and protected by default
And check the phone number , You must log in through your account first

After logging in , Clicking the view button on the page will call an interface , Then the phone number is completely displayed
https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b
PS: The URL In the address b2147f6a-6ec7-403e-a836-62978992841b You can get it from the web source code , Corresponding to the enterprise one by one

Through the picture below , We found that in the above interface response values 「 tel 」 Fields can be spliced into a picture , The content in the picture is consistent with the telephone number
therefore , We just need to download this picture , utilize OCR It is possible to identify

2 - Realization
Because the text and picture background on the website is very clean , Therefore, no additional training is required to improve the character recognition rate
First , We call the interface to get the one-to-one correspondence of telephone numbers tel Field
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
'Cookie': '***'
}
# Get the corresponding mobile phone number tel Field id( One-to-one correspondence )
def get_tel_id():
# b2147f6a-6ec7-403e-a836-62978992841b Corresponding enterprise , It is also a one-to-one correspondence ( Web source code )
url = "https://**/getentcontacts/b2147f6a-6ec7-403e-a836-62978992841b"
payload = {}
resp = requests.request("GET", url, headers=headers, data=payload).json()
tel_id = resp.get("tel")
return tel_idthen , Use the above tel Field composition picture URL Address
Last , You can recognize the characters of the pictures
Here are 2 Ways of planting :
Baidu OCR
pytesseract
2-1 Baidu OCR
First , Install dependency packages
# Install dependency packages
pip3 install baidu-aip then , Create an application for character recognition , Get applied APP_ID、API_KEY、SECRET_KEY data
Last , Refer to the official documentation and call the following method to identify the image , Get mobile number data
Official document :
https://cloud.baidu.com/doc/OCR/s/wkibizyjk
from aip import AipOcr
def get_phone(tel_id):
"""
Baidu OCR Identify pictures , Get text content
:param tel_id:
:return:
"""
url = f'https://www.**.com/home/Phone/{tel_id}'
APP_ID = '262**'
API_KEY = '1btP8uUSzfDbji**'
SECRET_KEY = 'NGm6NgAM5ajHcksKs0**'
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
result = client.basicGeneralUrl(url)
# {'words_result': [{'words': '0771-672**'}], 'words_result_num': 1, 'log_id': 1527210***}
print(' The recognized mobile phone number is :', result)2-2 pytesseract
Again , We need to install character recognition first 、 Dependent package for image processing
# Install dependency packages
pip3 install pillow
pip3 install pytesseractthen , According to the picture URL Address get picture byte stream , The use of pytesseract Just recognize the words in the picture
import io
import pytesseract
import requests
from PIL import Image
if __name__ == '__main__':
# Get mobile phone number URL Address
image_url = f'https://www.**.com/home/Phone/{get_tel_id()}'
resp = requests.get(image_url, headers=headers)
# images.content: Get the binary byte stream of the picture
# io.BytesIO(): Operations handle binary data
# Image.open(): Open picture byte stream , Get a picture object
images_c = Image.open(io.BytesIO(resp.content))
# utilize pytesseract Identify the string in the picture , It is the mobile phone number
phone = pytesseract.image_to_string(images_c)
print(f' Contact information : {phone}')The above is the conventional way to apply image camouflage , We just need to find out the rules of image generation , And then use it OCR To be recognized as text , Finally, they can be assembled together

End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction

Scan purchase


You'd better watch it

边栏推荐
- AI金榜题名时,MLPerf榜单的份量究竟有多重?
- Applet modification style (placeholder, checkbox style)
- 7-1 linked list is also simple fina
- C language makes it easy to add, delete, modify and check the linked list "suggested collection"
- Web3.0时代来了,看天翼云存储资源盘活系统如何赋能新基建(下)
- 2022年5月腾讯云开发者社区视频月度榜单公布
- MYSQL中 find_in_set() 函数用法详解
- Analysis of postman core functions - parameterization and test report
- 2022 Alibaba Android advanced interview questions sharing, 2022 Alibaba hand Taobao Android interview questions
- Case sharing | integrated construction of data operation and maintenance in the financial industry
猜你喜欢

cf:B. Almost Ternary Matrix【對稱 + 找規律 + 構造 + 我是構造垃圾】

Low code practice of xtransfer, a cross-border payment platform: how to integrate with other medium-sized platforms is the core

Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?

UDF implementation of Dameng database

Windows Oracle 开启远程连接 Windows Server Oracle 开启远程连接

How to automatically install pythn third-party libraries

Use of websocket tool

2022 Alibaba Android advanced interview questions sharing, 2022 Alibaba hand Taobao Android interview questions

华为让出的高端市场,小米12S靠徕卡能抢到吗?

在线协作产品哪家强?微软 Loop 、Notion、FlowUs
随机推荐
紧固件行业供应商绩效考核繁琐?选对工具才能轻松逆袭!
How to automatically install pythn third-party libraries
2022 the most complete Tencent background automation testing and continuous deployment practice in the whole network [10000 words]
2022年5月腾讯云开发者社区视频月度榜单公布
Precautions for RTD temperature measurement of max31865 module
Summary of six points of MySQL optimization
What is text mining? "Suggested collection"
Chinese postman? Really powerful!
ROS installation error sudo: rosdep: command not found
C language makes it easy to add, delete, modify and check the linked list "suggested collection"
The monthly list of Tencent cloud developer community videos was released in May 2022
Oracle日期格式转换 to_date,to_char,to_timetamp 相互转换
【Autosar 十四 启动流程详解】
Reading notes of Clickhouse principle analysis and Application Practice (5)
国内低代码开发平台靠谱的都有哪些?
XML基础知识概念
2022最新中高级Android面试题目,【原理+实战+视频+源码】
2022最新大厂Android面试真题解析,Android开发必会技术
公司破产后,黑石们来了
Simple query cost estimation