当前位置:网站首页>Robots protocol
Robots protocol
2022-07-03 07:36:00 【start field】
When we use crawlers to crawl data , Some websites will not block , But some websites will not easily give you the data to crawl , So there it is Anti climbing mechanism . Then we want to know what to do with data with anti crawl mechanism , So there was Anti-crawl strategy .
Anti climbing mechanism
Portal websites can formulate corresponding strategies and technical means , Prevent crawlers from crawling website data .
Anti-crawl strategy
The crawler program develops corresponding strategies and technical means , Crack the anti crawling mechanism in portal website , So you can get the data of the website .
Their relationship is like spear and shield , After that, we will be exposed to many anti climbing mechanisms , You will also learn a lot of anti climbing strategies . Now let's learn the simplest anti crawl mechanism robots Agreement is also called Gentleman agreement .
robots.txt agreement
It stipulates which data in the website can be crawled and which data cannot be crawled . But no defensive measures were taken , Whether you comply or not depends on yourself , So it is also called Gentleman agreement .
Take Baidu as an example to open https://www.baidu.com/robots.txt You can see Baidu robots agreement .

边栏推荐
- 图像识别与检测--笔记
- IndexSort
- PgSQL converts string to double type (to_number())
- Technology dry goods | luxe model for the migration of mindspore NLP model -- reading comprehension task
- The difference between typescript let and VaR
- The babbage industrial policy forum
- docket
- Lucene introduces NFA
- 【MindSpore论文精讲】AAAI长尾问题中训练技巧的总结
- Hisat2 - stringtie - deseq2 pipeline for bulk RNA seq
猜你喜欢

FileInputStream and fileoutputstream

Technical dry goods Shengsi mindspire lite1.5 feature release, bringing a new end-to-end AI experience

c语言指针的概念

IO stream system and FileReader, filewriter

How long is the fastest time you can develop data API? One minute is enough for me

Wireshark software usage

Topic | synchronous asynchronous

Summary of Arduino serial functions related to print read

docker建立mysql:5.7版本指定路径挂载不上。

Reconnaissance et détection d'images - Notes
随机推荐
Arduino Serial系列函数 有关print read 的总结
2. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
Jeecg data button permission settings
Grpc message sending of vertx
Talk about floating
你开发数据API最快多长时间?我1分钟就足够了
Industrial resilience
URL programming
Topic | synchronous asynchronous
Logging log configuration of vertx
Responsive MySQL of vertx
Map interface and method
《指环王:力量之戒》新剧照 力量之戒铸造者亮相
Summary of Arduino serial functions related to print read
New stills of Lord of the rings: the ring of strength: the caster of the ring of strength appears
Arduino 软串口通信 的几点体会
Implementation of breadth first in aggregation in ES
C code production YUV420 planar format file
Vertx's responsive MySQL template
Traversal in Lucene