当前位置:网站首页>Turning and anti-climbing attack and defense
Turning and anti-climbing attack and defense
2022-08-02 10:11:00 【InfoQ】
一、背景
二、现状
- 规律性
- 高频次
- 正面进攻:This pattern is characterized by a large number of requests,Rough camouflage,Focusing on an interface,By the number and diversity to win.
- espionage operation:Less but more regular request,They will try to disguised as a real user,Mimic the behavior of users,To obtain the key interface more core information.
- 数据安全性: 作为电商平台,The key commodity information,And user information once crawl,Most likely resulting in the loss of goods、用户信息泄露、Even telecommunications fraud and a series of safety problems.
- 大数据异常: Statistics such asdau,pv,uvEtc are dependent on the interface request log every day,Once these log records the data of real users creeper,You will lose statistical effect.
- 服务稳定性: Due to the above mentioned the first attack mode,The crawler will be a large number of requests,Some even close to yu hong pan attack,This will greatly increase the load on the server,If there are new activities at the same time,Can cause traffic soared,Leading to system paralysis.
- Personal failure: appAccording to provide all the user's search content to the general user search keywords,The crawler if intrusive search interface,And a large number of malicious keyword search,Is expected to push for normal user keywords less accurate,从而影响用户体验.
三、前人研究
- 登录限制: Requires the user to request interface must login,This approach can largely increase the cost of the crawler,But more arbitrary,In the key nodes are easy to affect the user experience.
- cookie校验: 用户请求cookieCan carry some used to identify the identity of the data in the,These data have their own a set of generating rules,Support the validity check,对这些数据进行验证,Can identify whether the real user.But when the crawler cracked generate rules,Or use of real userscookie进行请求时,This method cannot be effective defense.
- Frequency check: Use the common features of each request(由于ip的成本最高,所以一般选择ip)frequency statistics,To decide whether the characteristics of banned for.Due to the crawler regularity,高频次的特点,This method can effectively prevent a lot of the crawler request,But it also brings another problem,同一ipSometimes more than one user in with,May be injured user.而不选择ip,Select other dimensions,Fake and low cost.
- 验证码校验: 同一ipMany times or user request reaches a certain threshold,Requires the user to enter the verification code,Verification code there are many kinds of,如文字、图形、滑动等.The sliding graphical verification code works best,Because of the high cost of image recognition,But need front end with,But also will affect the user experience.
- 数据加密: The front of the request data is encrypted calculate,And the encrypted value as a parameter to the server,在服务器端同样有一段加密逻辑,生成一串编码,With the request parameters matching,匹配通过则会返回数据.This method still need the client to participate in,And the encryption algorithm expressly written in the bookJS里,The crawler or can be analyzed out.
四、CleanerReptile cleaner
- 准确性:Have the ability to grasp the crawler,To avoid friendly fire again.
- 实时性:秒级别的响应,If catch all reptiles such as the data,满载而归了,Again it's meaningless to ban.
- 正面进攻:合法性校验,频次控制.
- espionage operation:用户行为分析.
4.1 系统模型

4.1.1 数据处理中心

4.1.2 Block Center

- The only user identity legitimacy: Due to the formation of the user's unique identification has a certain rule,Around is no exception,Naturally we can use these rules to determine whether a request for the real user.A large number of illegal request don't need other judgment,Use only generate rules can block,This strategy is mainly used to stop the frontal attack.
- 频次: 同样地,This strategy is mainly used to stop the frontal attack.But it is used to make up for the inadequacy of identity legitimacy strategies,When the crawler using real user's identity to a large number of requests,We can use them the characteristics of high frequency,Setting the threshold value for a particular interface frequency,When the request more than limit,To specify a user characteristics were banned.
- El表达式: The above two strategies can significantly weaken the influence of the frontal attack,But for spying operation is almost powerless,Because when some cunning crawler repeatedly frustrated,Ascertained the climbing strategy,after the frequency threshold.They will choose to imitate the behavior of real users to request,Give up a short time to get a lot of information fantasy,Turning to though time is long, can access to the complete data.这时ELExpressions are used,The core can be banned users a request list,CleanerCan analyze the user behavior in this request,Look at them any deviation from the normal user request,At this time of the crawler although did not have the characteristics of high frequency,但周期性,The characteristics of regularity,Is the crawler sin,They will never be able to avoid.According to some different interface request order,Frequency ratio of different characteristics such as whether the user can be deduced illegal.
- 封禁记录: 辅助策略,To hit the banned user dimension into the database,As an indicator of determine whether banned in the future.
- 黑白名单: Specify the characteristics of skip or forced banned,Support manually add,In case of system failure or disorder.
- Access to other banned library:辅助策略,Combined with other business banned information,Improve the judgment result.
4.1.3 封禁库
4.2 效果

五、总结
- 基本特性:实时性,准确性.
- 基本功能:合法性校验,频次控制,用户行为分析.
- 基本模块:大数据处理中心,Banned strategy center.
- 基本策略:Legitimacy check strategy,frequency strategy,ELExpression strategy,黑白名单策略.
- Two dynamics:The perfection of the climbing strategy、插拔;The adjustment of the scoring standard.
- a balance:The crawler and the game process between the crawler is a long,Both will eventually reach a state of balance,In the face of the crawler continuous rebound,We can do is to continue to monitor,suppress quickly.
边栏推荐
- DVWA Clearance Log 2 - Command Injection
- 享年94岁,图灵奖得主、计算复杂性理论先驱Juris Hartmanis逝世
- 周鸿祎称微软抄袭 360 安全模式后发文否认;英特尔CEO基辛格回应市值被AMD超越:股价下跌是咎由自取|极客头条...
- LayaBox---TypeScript---三斜线指令
- 你好,我的新名字叫“铜锁/Tongsuo”
- 后管实现面包屑功能
- R语言时间序列数据算术运算:使用log函数将时间序列数据的数值对数化、使用diff函数计算对数化后的时间序列数据的逐次差分(计算价格的对数差分)
- Facebook自动化数据分析方案,广告投放省心省力
- mysql连接池的实现
- leetcode 62. Unique Paths(独特的路径)
猜你喜欢
阿里巴巴 CTO 程立:开源是基础软件的源头!
Facebook自动化数据分析方案,广告投放省心省力
第十五章 多线程
MSYS2 QtCreator Clangd 代码分析找不到 mm_malloc.h的问题补救
Verilog的随机数系统任务----$random
从零开始入门单片机(一):必会背景知识总结
牛客网项目2.7开发注册功能 报错This application has no explicit mapping for /error......
零代码工具推荐---HiFlow
DVWA Clearance Log 2 - Command Injection
李航《统计学习方法》笔记之朴素贝叶斯法
随机推荐
用汇编实现爱心特效【七夕来袭】
yolov7创新点
斯皮尔曼相关系数
图形化矩阵,矩阵到底长什么样?
打印lua内部结构的函数调用
利用二维数据学习纹理三维网格生成(CVPR 2020)
function call to print lua internal structure
如何搭建威纶通触摸屏与S7-200smart之间无线PPI通信?
你认同这个观点吗?大多数企业的数字化都只是为了缓解焦虑
js防抖函数和函数节流的应用场景
李航《统计学习方法》笔记之感知机perceptron
wireshark的安装教程(暖气片安装方法图解)
【新版干货书】深度伪造 (DeepFakes):创造,检测和影响
You Only Hypothesize Once: 用旋转等变描述子估计变换做点云配准(已开源)
Pytorch's LSTM parameters explained
R语言ggplot2可视化:使用ggpubr包的ggbarplot函数可视化堆叠的柱状图(stacked bar plot)、lab.pos参数指定柱状图的数值标签的位置,lab.col参数指定数值标
身为程序猿——谷歌浏览器的这些骚操作你真的废吗!【熬夜整理&建议收藏】[通俗易懂]
【技术分享】OSPFv3基本原理
Pytorch的LSTM参数解释
从零开始Blazor Server(5)--权限验证