当前位置:网站首页>Turning and anti-climbing attack and defense
Turning and anti-climbing attack and defense
2022-08-02 10:11:00 【InfoQ】
一、背景
二、现状
- 规律性
- 高频次
- 正面进攻:This pattern is characterized by a large number of requests,Rough camouflage,Focusing on an interface,By the number and diversity to win.
- espionage operation:Less but more regular request,They will try to disguised as a real user,Mimic the behavior of users,To obtain the key interface more core information.
- 数据安全性: 作为电商平台,The key commodity information,And user information once crawl,Most likely resulting in the loss of goods、用户信息泄露、Even telecommunications fraud and a series of safety problems.
- 大数据异常: Statistics such asdau,pv,uvEtc are dependent on the interface request log every day,Once these log records the data of real users creeper,You will lose statistical effect.
- 服务稳定性: Due to the above mentioned the first attack mode,The crawler will be a large number of requests,Some even close to yu hong pan attack,This will greatly increase the load on the server,If there are new activities at the same time,Can cause traffic soared,Leading to system paralysis.
- Personal failure: appAccording to provide all the user's search content to the general user search keywords,The crawler if intrusive search interface,And a large number of malicious keyword search,Is expected to push for normal user keywords less accurate,从而影响用户体验.
三、前人研究
- 登录限制: Requires the user to request interface must login,This approach can largely increase the cost of the crawler,But more arbitrary,In the key nodes are easy to affect the user experience.
- cookie校验: 用户请求cookieCan carry some used to identify the identity of the data in the,These data have their own a set of generating rules,Support the validity check,对这些数据进行验证,Can identify whether the real user.But when the crawler cracked generate rules,Or use of real userscookie进行请求时,This method cannot be effective defense.
- Frequency check: Use the common features of each request(由于ip的成本最高,所以一般选择ip)frequency statistics,To decide whether the characteristics of banned for.Due to the crawler regularity,高频次的特点,This method can effectively prevent a lot of the crawler request,But it also brings another problem,同一ipSometimes more than one user in with,May be injured user.而不选择ip,Select other dimensions,Fake and low cost.
- 验证码校验: 同一ipMany times or user request reaches a certain threshold,Requires the user to enter the verification code,Verification code there are many kinds of,如文字、图形、滑动等.The sliding graphical verification code works best,Because of the high cost of image recognition,But need front end with,But also will affect the user experience.
- 数据加密: The front of the request data is encrypted calculate,And the encrypted value as a parameter to the server,在服务器端同样有一段加密逻辑,生成一串编码,With the request parameters matching,匹配通过则会返回数据.This method still need the client to participate in,And the encryption algorithm expressly written in the bookJS里,The crawler or can be analyzed out.
四、CleanerReptile cleaner
- 准确性:Have the ability to grasp the crawler,To avoid friendly fire again.
- 实时性:秒级别的响应,If catch all reptiles such as the data,满载而归了,Again it's meaningless to ban.
- 正面进攻:合法性校验,频次控制.
- espionage operation:用户行为分析.
4.1 系统模型

4.1.1 数据处理中心

4.1.2 Block Center

- The only user identity legitimacy: Due to the formation of the user's unique identification has a certain rule,Around is no exception,Naturally we can use these rules to determine whether a request for the real user.A large number of illegal request don't need other judgment,Use only generate rules can block,This strategy is mainly used to stop the frontal attack.
- 频次: 同样地,This strategy is mainly used to stop the frontal attack.But it is used to make up for the inadequacy of identity legitimacy strategies,When the crawler using real user's identity to a large number of requests,We can use them the characteristics of high frequency,Setting the threshold value for a particular interface frequency,When the request more than limit,To specify a user characteristics were banned.
- El表达式: The above two strategies can significantly weaken the influence of the frontal attack,But for spying operation is almost powerless,Because when some cunning crawler repeatedly frustrated,Ascertained the climbing strategy,after the frequency threshold.They will choose to imitate the behavior of real users to request,Give up a short time to get a lot of information fantasy,Turning to though time is long, can access to the complete data.这时ELExpressions are used,The core can be banned users a request list,CleanerCan analyze the user behavior in this request,Look at them any deviation from the normal user request,At this time of the crawler although did not have the characteristics of high frequency,但周期性,The characteristics of regularity,Is the crawler sin,They will never be able to avoid.According to some different interface request order,Frequency ratio of different characteristics such as whether the user can be deduced illegal.
- 封禁记录: 辅助策略,To hit the banned user dimension into the database,As an indicator of determine whether banned in the future.
- 黑白名单: Specify the characteristics of skip or forced banned,Support manually add,In case of system failure or disorder.
- Access to other banned library:辅助策略,Combined with other business banned information,Improve the judgment result.
4.1.3 封禁库
4.2 效果

五、总结
- 基本特性:实时性,准确性.
- 基本功能:合法性校验,频次控制,用户行为分析.
- 基本模块:大数据处理中心,Banned strategy center.
- 基本策略:Legitimacy check strategy,frequency strategy,ELExpression strategy,黑白名单策略.
- Two dynamics:The perfection of the climbing strategy、插拔;The adjustment of the scoring standard.
- a balance:The crawler and the game process between the crawler is a long,Both will eventually reach a state of balance,In the face of the crawler continuous rebound,We can do is to continue to monitor,suppress quickly.
边栏推荐
- 一款优秀的中文识别库——ocr
- DVWA Clearance Log 2 - Command Injection
- armv7与armv8的区别(v8和w12的区别)
- 迭代器失效问题
- logo 图标(php图片加文字水印)
- sqlmap安装教程用w+r打开(sqlyog安装步骤)
- 【New Edition】DeepFakes: Creation, Detection and Influence
- MSYS2 QtCreator Clangd 代码分析找不到 mm_malloc.h的问题补救
- Spearman's correlation coefficient
- R语言ggplot2可视化:基于aes函数中的fill参数和shape参数自定义绘制分组折线图并添加数据点(散点)、使用theme函数的legend.position函数配置图例到图像右侧
猜你喜欢
随机推荐
软件测试X模型
R语言使用zoo包中的rollapply函数以滚动的方式、窗口移动的方式将指定函数应用于时间序列、设置align参数指定结果数据中的时间标签取自窗口中的位置(参数right指定取自窗口的最右侧)
Do you agree with this view?Most businesses are digitizing just to ease anxiety
R language time series data arithmetic operation: use the log function to log the time series data, and use the diff function to calculate the successive difference of the logarithmic time series data
李航《统计学习方法》笔记之k近邻法
Weak yen turns game consoles into "financial products" in Japan: scalpers make big profits
Verilog's random number system task----$random
Use compilation to realize special effects of love
LayaBox---TypeScript---三斜线指令
链表的实现
Application scenarios of js anti-shake function and function throttling
yolov7创新点
R语言时间序列数据的平滑:使用KernSmooth包的dpill函数和locpoly函数对时间序列数据进行平滑以消除噪声
MySql tens of millions of paging optimization, fast insertion method of tens of millions of data
R language ggplot2 visualization: use the ggtexttable function of the ggpubr package to visualize tabular data (directly draw tabular graphs or add tabular data to images), use tbody_add_border to add
软件测试H模型
要长续航还是更安全?海豹与深蓝SL03对比导购
mysql连接池的实现
HikariCP数据库连接池,太快了!
3D激光slam:LeGO-LOAM---地面点提取方法及代码分析









