当前位置:网站首页>Key points of anti reptile: identifying reptiles
Key points of anti reptile: identifying reptiles
2022-07-07 19:15:00 【Hua Weiyun】
When we run the website , The biggest problem is : We spend hours or even days painstakingly creating the content , Being crawled only needs 1s I caught it . In order to protect the achievements of our creation , Also for the stable operation of the website , We need to say to reptiles :No, The most important thing in our anti reptile process is How to identify reptiles .
To identify reptiles , The following methods are commonly used :
Human detection
The so-called human detection is the presence of a verification code , Let you enter the content of the verification code , These verification codes are easy for humans to recognize , But it is difficult for machines to recognize , For example, this verification code :
This kind of verification code is easily recognized by humans , Reptiles But it's hard to recognize .
Slider class verification code
The verification code here is also very friendly to humans , We just need to move the slider to a shadow position , For example, the following :
This kind of verification code is easy for people , But it is difficult for reptiles , But it can also be cracked .
These can really prevent crawlers from crawling your creative content on some special occasions , But you succeeded in preventing Malicious reptiles , Also successfully prevented Search engine crawler To grab your content .
Why should we allow search engine crawlers to crawl our content ?
This is mainly because search engines can bring me traffic , With the flow, we can find ways to cash , Do you smell money .
For example, we search in Baidu : Reptile recognition , And click on my website , Visitors visit the website , It brings traffic to the website .
How does Baidu know what content is on our website ?
Baidu search engine has thousands of crawlers grabbing content on the Internet every day , And will Baidu crawler The captured content is stored in your own index , Rank each page according to a certain algorithm , Then the user searches the corresponding keywords , It is possible to reach your website , It will bring you traffic .
This is why we can't block search engine crawlers , If you shield search engine crawlers like other crawlers , Then search engine crawlers will not be able to crawl the content of your website , Your website will not be displayed in search results , It won't bring you any traffic .
Now there is a problem , We should shield some malicious crawlers , It can't shield search engine crawlers , I'm really hard !
To solve this problem , We can use Reptile recognition This website solves the above problems .
First of all, we need to be based on User-agent First filter out some malicious crawlers , Search engines User-agent We can see it here : Search engine crawler
Here we collect and sort out the information of most search engines on the market User-agent And IP Address , For example, the following is Baidu spider User-agent:
By comparison User-agent We can preliminarily judge whether it is a crawler of search engine , however User-agent It can be easily forged , So we still need to cooperate IP To identify whether the reptile is real .
We just need to get to Reptiles IP Inquire about Input IP You can know whether this is a fake reptile .
summary
This article starts with the verification code and how to prevent crawlers from crawling our website , But we can't block all crawlers from crawling our website , How do we get through User-agent And IP Combined way to judge is search engine crawler , And let it crawl our website .
边栏推荐
- 多个kubernetes集群如何实现共享同一个存储
- Numpy——2.数组的形状
- Realize payment function in applet
- Redis cluster and expansion
- 最长公共前缀(leetcode题14)
- DeSci:去中心化科学是Web3.0的新趋势?
- First time in China! The language AI strength of this Chinese enterprise is recognized as No.2 in the world! Second only to Google
- Desci: is decentralized science the new trend of Web3.0?
- Interview vipshop internship testing post, Tiktok internship testing post [true submission]
- 2022年推荐免费在线接收短信平台(国内、国外)
猜你喜欢
Borui data was selected in the 2022 love analysis - Panoramic report of it operation and maintenance manufacturers
高温火烧浑不怕,钟薛高想留清白在人间
Differences between rip and OSPF and configuration commands
Charles+Postern的APP抓包
虚拟数字人里的生意经
A hodgepodge of ICER knowledge points (attached with a large number of topics, which are constantly being updated)
Continuous test (CT) practical experience sharing
App capture of charles+drony
链式二叉树的基本操作(C语言实现)
The top of slashdata developer tool is up to you!!!
随机推荐
Seize Jay Chou
How to estimate the value of "not selling pens" Chenguang?
Three forms of multimedia technology commonly used in enterprise exhibition hall design
Business experience in virtual digital human
Kirk borne's selection of learning resources this week [click the title to download directly]
A hodgepodge of ICER knowledge points (attached with a large number of topics, which are constantly being updated)
Scientists have observed for the first time that the "electron vortex" helps to design more efficient electronic products
Short selling, overprinting and stock keeping, Oriental selection actually sold 2.66 million books in Tiktok in one month
【MIME笔记】
Redis的发布与订阅
[Blue Bridge Cup training 100 questions] sort scratch from small to large. Blue Bridge Cup scratch competition special prediction programming question centralized training simulation exercise question
PV静态创建和动态创建
Reuse of data validation framework Apache bval
反爬虫的重点:识别爬虫
[tpm2.0 principle and Application guide] Chapter 16, 17 and 18
L1-019 who falls first (Lua)
Antisamy: a solution against XSS attack tutorial
POJ 1182: food chain (parallel search) [easy to understand]
5billion, another master fund was born in Fujian
2022.07.02