当前位置:网站首页>Key points of anti reptile: identifying reptiles
Key points of anti reptile: identifying reptiles
2022-07-07 19:15:00 【Hua Weiyun】
When we run the website , The biggest problem is : We spend hours or even days painstakingly creating the content , Being crawled only needs 1s I caught it . In order to protect the achievements of our creation , Also for the stable operation of the website , We need to say to reptiles :No, The most important thing in our anti reptile process is How to identify reptiles .
To identify reptiles , The following methods are commonly used :
Human detection
The so-called human detection is the presence of a verification code , Let you enter the content of the verification code , These verification codes are easy for humans to recognize , But it is difficult for machines to recognize , For example, this verification code :

This kind of verification code is easily recognized by humans , Reptiles But it's hard to recognize .
Slider class verification code
The verification code here is also very friendly to humans , We just need to move the slider to a shadow position , For example, the following :

This kind of verification code is easy for people , But it is difficult for reptiles , But it can also be cracked .
These can really prevent crawlers from crawling your creative content on some special occasions , But you succeeded in preventing Malicious reptiles , Also successfully prevented Search engine crawler To grab your content .
Why should we allow search engine crawlers to crawl our content ?
This is mainly because search engines can bring me traffic , With the flow, we can find ways to cash , Do you smell money .
For example, we search in Baidu : Reptile recognition , And click on my website , Visitors visit the website , It brings traffic to the website .

How does Baidu know what content is on our website ?
Baidu search engine has thousands of crawlers grabbing content on the Internet every day , And will Baidu crawler The captured content is stored in your own index , Rank each page according to a certain algorithm , Then the user searches the corresponding keywords , It is possible to reach your website , It will bring you traffic .
This is why we can't block search engine crawlers , If you shield search engine crawlers like other crawlers , Then search engine crawlers will not be able to crawl the content of your website , Your website will not be displayed in search results , It won't bring you any traffic .
Now there is a problem , We should shield some malicious crawlers , It can't shield search engine crawlers , I'm really hard !
To solve this problem , We can use Reptile recognition This website solves the above problems .
First of all, we need to be based on User-agent First filter out some malicious crawlers , Search engines User-agent We can see it here : Search engine crawler
Here we collect and sort out the information of most search engines on the market User-agent And IP Address , For example, the following is Baidu spider User-agent:

By comparison User-agent We can preliminarily judge whether it is a crawler of search engine , however User-agent It can be easily forged , So we still need to cooperate IP To identify whether the reptile is real .
We just need to get to Reptiles IP Inquire about Input IP You can know whether this is a fake reptile .
summary
This article starts with the verification code and how to prevent crawlers from crawling our website , But we can't block all crawlers from crawling our website , How do we get through User-agent And IP Combined way to judge is search engine crawler , And let it crawl our website .
边栏推荐
- Micro service remote debug, nocalhost + rainbow micro service development second bullet
- Mathematical analysis_ Notes_ Chapter 11: Fourier series
- [sword finger offer] 59 - I. maximum value of sliding window
- Redis cluster and expansion
- DeSci:去中心化科学是Web3.0的新趋势?
- [Base64 notes] [suggestions collection]
- 5billion, another master fund was born in Fujian
- GSAP animation library
- 学习open62541 --- [67] 添加自定义Enum并显示名字
- PTA 1102 教超冠军卷
猜你喜欢

Policy mode - unity

Command mode - unity

Do you know all four common cache modes?

The top of slashdata developer tool is up to you!!!

Seize Jay Chou

Version 2.0 of tapdata, the open source live data platform, has been released

I feel cheated. Wechat tests the function of "size number" internally, and two wechat can be registered with the same mobile number

Pasqal首席技术官:模拟量子计算率先为工业带来量子优势

2022.07.04

【软件测试】从企业版BOSS直聘,看求职简历,你没被面上是有原因的
随机推荐
虚拟数字人里的生意经
Cadre de validation des données Apache bval réutilisé
For friends who are not fat at all, nature tells you the reason: it is a genetic mutation
Where does brain hole come from? New research from the University of California: creative people's neural connections will "take shortcuts"
Standard ACL and extended ACL
Embedded interview questions (algorithm part)
前首富,沉迷种田
SD_ DATA_ RECEIVE_ SHIFT_ REGISTER
Realize payment function in applet
Borui data was selected in the 2022 love analysis - Panoramic report of it operation and maintenance manufacturers
SlashData开发者工具榜首等你而定!!!
數據驗證框架 Apache BVal 再使用
Continuous test (CT) practical experience sharing
鸿蒙智能家居【1.0】
【软件测试】从企业版BOSS直聘,看求职简历,你没被面上是有原因的
面试唯品会实习测试岗、抖音实习测试岗【真实投稿】
Big Ben (Lua)
6.关于jwt
【牛客网刷题系列 之 Verilog进阶挑战】~ 多bit MUX同步器
Golang client server login