当前位置:网站首页>Key points of anti reptile: identifying reptiles
Key points of anti reptile: identifying reptiles
2022-07-07 19:15:00 【Hua Weiyun】
When we run the website , The biggest problem is : We spend hours or even days painstakingly creating the content , Being crawled only needs 1s I caught it . In order to protect the achievements of our creation , Also for the stable operation of the website , We need to say to reptiles :No, The most important thing in our anti reptile process is How to identify reptiles .
To identify reptiles , The following methods are commonly used :
Human detection
The so-called human detection is the presence of a verification code , Let you enter the content of the verification code , These verification codes are easy for humans to recognize , But it is difficult for machines to recognize , For example, this verification code :
This kind of verification code is easily recognized by humans , Reptiles But it's hard to recognize .
Slider class verification code
The verification code here is also very friendly to humans , We just need to move the slider to a shadow position , For example, the following :
This kind of verification code is easy for people , But it is difficult for reptiles , But it can also be cracked .
These can really prevent crawlers from crawling your creative content on some special occasions , But you succeeded in preventing Malicious reptiles , Also successfully prevented Search engine crawler To grab your content .
Why should we allow search engine crawlers to crawl our content ?
This is mainly because search engines can bring me traffic , With the flow, we can find ways to cash , Do you smell money .
For example, we search in Baidu : Reptile recognition , And click on my website , Visitors visit the website , It brings traffic to the website .
How does Baidu know what content is on our website ?
Baidu search engine has thousands of crawlers grabbing content on the Internet every day , And will Baidu crawler The captured content is stored in your own index , Rank each page according to a certain algorithm , Then the user searches the corresponding keywords , It is possible to reach your website , It will bring you traffic .
This is why we can't block search engine crawlers , If you shield search engine crawlers like other crawlers , Then search engine crawlers will not be able to crawl the content of your website , Your website will not be displayed in search results , It won't bring you any traffic .
Now there is a problem , We should shield some malicious crawlers , It can't shield search engine crawlers , I'm really hard !
To solve this problem , We can use Reptile recognition This website solves the above problems .
First of all, we need to be based on User-agent First filter out some malicious crawlers , Search engines User-agent We can see it here : Search engine crawler
Here we collect and sort out the information of most search engines on the market User-agent And IP Address , For example, the following is Baidu spider User-agent:
By comparison User-agent We can preliminarily judge whether it is a crawler of search engine , however User-agent It can be easily forged , So we still need to cooperate IP To identify whether the reptile is real .
We just need to get to Reptiles IP Inquire about Input IP You can know whether this is a fake reptile .
summary
This article starts with the verification code and how to prevent crawlers from crawling our website , But we can't block all crawlers from crawling our website , How do we get through User-agent And IP Combined way to judge is search engine crawler , And let it crawl our website .
边栏推荐
- Review of network attack and defense
- First time in China! The language AI strength of this Chinese enterprise is recognized as No.2 in the world! Second only to Google
- [HDU] 5248 sequence transformation (greedy + dichotomy) [recommended collection]
- Redis的发布与订阅
- 企业MES制造执行系统的分类与应用
- Classification and application of enterprise MES Manufacturing Execution System
- Flipping game (enumeration)
- How much does it cost to develop a small program mall?
- LeetCode 890(C#)
- 前首富,沉迷种田
猜你喜欢
【塔望方法论】塔望3W消费战略 - U&A研究法
Antisamy: a solution against XSS attack tutorial
5billion, another master fund was born in Fujian
Creative changes brought about by the yuan universe
I feel cheated. Wechat tests the function of "size number" internally, and two wechat can be registered with the same mobile number
Zhong Xuegao wants to remain innocent in the world
99% of people don't know that privatized deployment is also a permanently free instant messaging software!
网易云信参与中国信通院《实时音视频服务(RTC)基础能力要求及评估方法》标准编制...
10 schemes to ensure interface data security
Seize Jay Chou
随机推荐
POJ 2392 Space Elevator
Desci: is decentralized science the new trend of Web3.0?
Teach your sister to write the message queue hand in hand
App capture of charles+drony
【MIME笔记】
Redis集群与扩展
学习open62541 --- [67] 添加自定义Enum并显示名字
PTA 1101 B是A的多少倍
Redis
Nat address translation
Charles+drony的APP抓包
虚拟数字人里的生意经
【牛客网刷题系列 之 Verilog进阶挑战】~ 多bit MUX同步器
App capture of charles+postern
Reuse of data validation framework Apache bval
Thread factory in thread pool
How to estimate the value of "not selling pens" Chenguang?
Reject policy of thread pool
【塔望方法论】塔望3W消费战略 - U&A研究法
从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks