当前位置:网站首页>Robots protocol
Robots protocol
2022-07-03 07:36:00 【start field】
When we use crawlers to crawl data , Some websites will not block , But some websites will not easily give you the data to crawl , So there it is Anti climbing mechanism . Then we want to know what to do with data with anti crawl mechanism , So there was Anti-crawl strategy .
Anti climbing mechanism
Portal websites can formulate corresponding strategies and technical means , Prevent crawlers from crawling website data .
Anti-crawl strategy
The crawler program develops corresponding strategies and technical means , Crack the anti crawling mechanism in portal website , So you can get the data of the website .
Their relationship is like spear and shield , After that, we will be exposed to many anti climbing mechanisms , You will also learn a lot of anti climbing strategies . Now let's learn the simplest anti crawl mechanism robots Agreement is also called Gentleman agreement .
robots.txt agreement
It stipulates which data in the website can be crawled and which data cannot be crawled . But no defensive measures were taken , Whether you comply or not depends on yourself , So it is also called Gentleman agreement .
Take Baidu as an example to open https://www.baidu.com/robots.txt You can see Baidu robots agreement .
边栏推荐
- Why is data service the direction of the next generation data center?
- Common architectures of IO streams
- [set theory] Stirling subset number (Stirling subset number concept | ball model | Stirling subset number recurrence formula | binary relationship refinement relationship of division)
- Use of file class
- Summary of Arduino serial functions related to print read
- gstreamer ffmpeg avdec解码数据流向分析
- Vertx's responsive MySQL template
- Sent by mqtt client server of vertx
- Technical dry goods Shengsi mindspire operator parallel + heterogeneous parallel, enabling 32 card training 242 billion parameter model
- Lucene merge document order
猜你喜欢
Comparison of advantages and disadvantages between most complete SQL and NoSQL
《指環王:力量之戒》新劇照 力量之戒鑄造者亮相
Common architectures of IO streams
Use of file class
技术干货|昇思MindSpore初级课程上线:从基本概念到实操,1小时上手!
C代码生产YUV420 planar格式文件
Analysis of the ninth Blue Bridge Cup single chip microcomputer provincial competition
Take you through the whole process and comprehensively understand the software accidents that belong to testing
TCP cumulative acknowledgement and window value update
Analysis of the problems of the 7th Blue Bridge Cup single chip microcomputer provincial competition
随机推荐
Vertx restful style web router
Leetcode 198: house raiding
Lucene merge document order
Lucene skip table
Technical dry goods Shengsi mindspire innovation model EPP mvsnet high-precision and efficient 3D reconstruction
Industrial resilience
Operation and maintenance technical support personnel have hardware maintenance experience in Hong Kong
不出网上线CS的各种姿势
技术干货|AI框架动静态图统一的思考
技术干货 | AlphaFold/ RoseTTAFold开源复现(2)—AlphaFold流程分析和训练构建
【MySQL 14】使用DBeaver工具远程备份及恢复MySQL数据库(Linux 环境)
The embodiment of generics in inheritance and wildcards
带你全流程,全方位的了解属于测试的软件事故
Lombok -- simplify code
项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass
昇思MindSpore再升级,深度科学计算的极致创新
《指环王:力量之戒》新剧照 力量之戒铸造者亮相
C code production YUV420 planar format file
Introduction of buffer flow
Vertx metric Prometheus monitoring indicators