当前位置:网站首页>Key points of anti reptile: identifying reptiles

Key points of anti reptile: identifying reptiles

2022-07-07 19:15:00 Hua Weiyun

When we run the website , The biggest problem is : We spend hours or even days painstakingly creating the content , Being crawled only needs 1s I caught it . In order to protect the achievements of our creation , Also for the stable operation of the website , We need to say to reptiles :No, The most important thing in our anti reptile process is How to identify reptiles .

To identify reptiles , The following methods are commonly used :

Human detection

The so-called human detection is the presence of a verification code , Let you enter the content of the verification code , These verification codes are easy for humans to recognize , But it is difficult for machines to recognize , For example, this verification code :

 Image identification verification code

This kind of verification code is easily recognized by humans , Reptiles But it's hard to recognize .

Slider class verification code

The verification code here is also very friendly to humans , We just need to move the slider to a shadow position , For example, the following :

 Slider verification code

This kind of verification code is easy for people , But it is difficult for reptiles , But it can also be cracked .

These can really prevent crawlers from crawling your creative content on some special occasions , But you succeeded in preventing Malicious reptiles , Also successfully prevented Search engine crawler To grab your content .

Why should we allow search engine crawlers to crawl our content ?

This is mainly because search engines can bring me traffic , With the flow, we can find ways to cash , Do you smell money .

For example, we search in Baidu : Reptile recognition , And click on my website , Visitors visit the website , It brings traffic to the website .

 Baidu search crawler recognition results

How does Baidu know what content is on our website ?

Baidu search engine has thousands of crawlers grabbing content on the Internet every day , And will Baidu crawler The captured content is stored in your own index , Rank each page according to a certain algorithm , Then the user searches the corresponding keywords , It is possible to reach your website , It will bring you traffic .

This is why we can't block search engine crawlers , If you shield search engine crawlers like other crawlers , Then search engine crawlers will not be able to crawl the content of your website , Your website will not be displayed in search results , It won't bring you any traffic .

Now there is a problem , We should shield some malicious crawlers , It can't shield search engine crawlers , I'm really hard !

To solve this problem , We can use Reptile recognition This website solves the above problems .

First of all, we need to be based on User-agent First filter out some malicious crawlers , Search engines User-agent We can see it here : Search engine crawler

Here we collect and sort out the information of most search engines on the market User-agent And IP Address , For example, the following is Baidu spider User-agent:

 Baidu spider's User-agent

By comparison User-agent We can preliminarily judge whether it is a crawler of search engine , however User-agent It can be easily forged , So we still need to cooperate IP To identify whether the reptile is real .

We just need to get to Reptiles IP Inquire about Input IP You can know whether this is a fake reptile .

summary

This article starts with the verification code and how to prevent crawlers from crawling our website , But we can't block all crawlers from crawling our website , How do we get through User-agent And IP Combined way to judge is search engine crawler , And let it crawl our website .

原网站

版权声明
本文为[Hua Weiyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071702162636.html