当前位置:网站首页>Key points of anti reptile: identifying reptiles
Key points of anti reptile: identifying reptiles
2022-07-07 19:15:00 【Hua Weiyun】
When we run the website , The biggest problem is : We spend hours or even days painstakingly creating the content , Being crawled only needs 1s I caught it . In order to protect the achievements of our creation , Also for the stable operation of the website , We need to say to reptiles :No, The most important thing in our anti reptile process is How to identify reptiles .
To identify reptiles , The following methods are commonly used :
Human detection
The so-called human detection is the presence of a verification code , Let you enter the content of the verification code , These verification codes are easy for humans to recognize , But it is difficult for machines to recognize , For example, this verification code :
This kind of verification code is easily recognized by humans , Reptiles But it's hard to recognize .
Slider class verification code
The verification code here is also very friendly to humans , We just need to move the slider to a shadow position , For example, the following :
This kind of verification code is easy for people , But it is difficult for reptiles , But it can also be cracked .
These can really prevent crawlers from crawling your creative content on some special occasions , But you succeeded in preventing Malicious reptiles , Also successfully prevented Search engine crawler To grab your content .
Why should we allow search engine crawlers to crawl our content ?
This is mainly because search engines can bring me traffic , With the flow, we can find ways to cash , Do you smell money .
For example, we search in Baidu : Reptile recognition , And click on my website , Visitors visit the website , It brings traffic to the website .
How does Baidu know what content is on our website ?
Baidu search engine has thousands of crawlers grabbing content on the Internet every day , And will Baidu crawler The captured content is stored in your own index , Rank each page according to a certain algorithm , Then the user searches the corresponding keywords , It is possible to reach your website , It will bring you traffic .
This is why we can't block search engine crawlers , If you shield search engine crawlers like other crawlers , Then search engine crawlers will not be able to crawl the content of your website , Your website will not be displayed in search results , It won't bring you any traffic .
Now there is a problem , We should shield some malicious crawlers , It can't shield search engine crawlers , I'm really hard !
To solve this problem , We can use Reptile recognition This website solves the above problems .
First of all, we need to be based on User-agent First filter out some malicious crawlers , Search engines User-agent We can see it here : Search engine crawler
Here we collect and sort out the information of most search engines on the market User-agent And IP Address , For example, the following is Baidu spider User-agent:
By comparison User-agent We can preliminarily judge whether it is a crawler of search engine , however User-agent It can be easily forged , So we still need to cooperate IP To identify whether the reptile is real .
We just need to get to Reptiles IP Inquire about Input IP You can know whether this is a fake reptile .
summary
This article starts with the verification code and how to prevent crawlers from crawling our website , But we can't block all crawlers from crawling our website , How do we get through User-agent And IP Combined way to judge is search engine crawler , And let it crawl our website .
边栏推荐
- 2022-07-04 matlab reads video frames and saves them
- Flipping game (enumeration)
- UVALive – 4621 Cav 贪心 + 分析「建议收藏」
- Basic operation of chain binary tree (implemented in C language)
- 博睿数据入选《2022爱分析 · IT运维厂商全景报告》
- 3.关于cookie
- [sword finger offer] 59 - I. maximum value of sliding window
- LeetCode 890(C#)
- App capture of charles+postern
- Standard ACL and extended ACL
猜你喜欢
虚拟数字人里的生意经
Pasqal首席技术官:模拟量子计算率先为工业带来量子优势
GSAP animation library
数据验证框架 Apache BVal 再使用
[tpm2.0 principle and Application guide] Chapter 16, 17 and 18
Standard ACL and extended ACL
如何选择合适的自动化测试工具?
【塔望方法论】塔望3W消费战略 - U&A研究法
How to choose the appropriate automated testing tools?
【软件测试】从企业版BOSS直聘,看求职简历,你没被面上是有原因的
随机推荐
2022.07.05
IP netns command (memo)
Reuse of data validation framework Apache bval
嵌入式面试题(算法部分)
How many times is PTA 1101 B than a
面试唯品会实习测试岗、抖音实习测试岗【真实投稿】
Is AI more fair than people in the distribution of wealth? Research on multiplayer game from deepmind
Golang client server login
App capture of charles+drony
testing and SQA_动态白盒測试[通俗易懂]
Nat address translation
PTA 1102 teaching Super Champion volume
In the first half of 2022, I found 10 books that have been passed around by my circle of friends
L1-019 who falls first (Lua)
unity2d的Rigidbody2D的MovePosition函数移动时人物或屏幕抖动问题解决
Initial experience of cache and ehcache "suggestions collection"
Continuous test (CT) practical experience sharing
Business experience in virtual digital human
[sword finger offer] 59 - I. maximum value of sliding window
Redis