当前位置:网站首页>Robots protocol
Robots protocol
2022-07-03 07:36:00 【start field】
When we use crawlers to crawl data , Some websites will not block , But some websites will not easily give you the data to crawl , So there it is Anti climbing mechanism . Then we want to know what to do with data with anti crawl mechanism , So there was Anti-crawl strategy .
Anti climbing mechanism
Portal websites can formulate corresponding strategies and technical means , Prevent crawlers from crawling website data .
Anti-crawl strategy
The crawler program develops corresponding strategies and technical means , Crack the anti crawling mechanism in portal website , So you can get the data of the website .
Their relationship is like spear and shield , After that, we will be exposed to many anti climbing mechanisms , You will also learn a lot of anti climbing strategies . Now let's learn the simplest anti crawl mechanism robots Agreement is also called Gentleman agreement .
robots.txt agreement
It stipulates which data in the website can be crawled and which data cannot be crawled . But no defensive measures were taken , Whether you comply or not depends on yourself , So it is also called Gentleman agreement .
Take Baidu as an example to open https://www.baidu.com/robots.txt You can see Baidu robots agreement .

边栏推荐
- Lucene merge document order
- 4everland: the Web3 Developer Center on IPFs has deployed more than 30000 dapps!
- Comparison of advantages and disadvantages between most complete SQL and NoSQL
- Wireshark software usage
- List exercises after class
- Le Seigneur des anneaux: l'anneau du pouvoir
- Analysis of the ninth Blue Bridge Cup single chip microcomputer provincial competition
- Vertx multi vertical shared data
- FileInputStream and fileoutputstream
- Use of file class
猜你喜欢

Margin left: -100% understanding in the Grail layout

Partage de l'expérience du projet: mise en œuvre d'un pass optimisé pour la fusion IR de la couche mindstore

Use of other streams

技术干货|昇思MindSpore Lite1.5 特性发布,带来全新端侧AI体验

Analysis of the ninth Blue Bridge Cup single chip microcomputer provincial competition

Take you through the whole process and comprehensively understand the software accidents that belong to testing

Summary of Arduino serial functions related to print read

最全SQL与NoSQL优缺点对比

技术干货|昇思MindSpore初级课程上线:从基本概念到实操,1小时上手!

Es writing fragment process
随机推荐
Technical dry goods Shengsi mindspire innovation model EPP mvsnet high-precision and efficient 3D reconstruction
An overview of IfM Engage
Jeecg request URL signature
专题 | 同步 异步
[cmake] cmake link SQLite Library
HCIA notes
Common methods of file class
Leetcode 198: 打家劫舍
Grpc message sending of vertx
Leetcode 198: house raiding
Leetcode 213: looting II
gstreamer ffmpeg avdec解码数据流向分析
昇思MindSpore再升级,深度科学计算的极致创新
The concept of C language pointer
Vertx's responsive redis client
4everland: the Web3 Developer Center on IPFs has deployed more than 30000 dapps!
Download address collection of various versions of devaexpress
[Development Notes] cloud app control on device based on smart cloud 4G adapter gc211
lucene scorer
C代码生产YUV420 planar格式文件