当前位置:网站首页>yandex robots txt
yandex robots txt
2022-07-28 05:23:00 【oHuangBing】
robots.txt It is a text file containing website index parameters , For search engine robots .
Yandex Support for... With advanced functions Robots Exclusion agreement .
When crawling a website ,Yandex The robot will load robots.txt file . If the latest request for this file shows a website page or part is prohibited , Robots will not index them .
Yandex robots.txt Document requirements
Yandex Robots can handle robots.txt, However, the following requirements should be met :
The file size does not exceed 500KB.
It is named "robots " Of TXT file , robots.txt.
The file is located in the root directory of the website .
This file can be used by robots : The server hosting the website is HTTP Code response , Status as 200 OK. Check the response of the server
If the document does not meet the requirements , The website is considered to be open indexed , That is to say Yandex Search engines can access web content at will .
Yandex Support from robots.txt Redirect files to files located on another website . under these circumstances , The instructions in the object file are taken into account . This kind of redirection may be very useful when moving websites .
Yandex visit robots.txt Some rules of
stay robots.txt In file , The robot will check to User-agent: The first record , And look for characters Yandex( Case is not important ) or *. If User-agent: Yandex String detected ,User-agent: * String will be ignored . If User-agent: Yandex and User-agent: * String not found , Robots will be considered to have unlimited access .
You can have the Yandex The robot inputs separate instructions .
For example, the following examples :
User-agent: YandexBot # Writing method for index crawler
Disallow: /*id=
User-agent: Yandex # Will be for all YandexBot work
Disallow: /*sid= # In addition to the main indexing robots
User-agent: * # Yes YandexBot It won't work
Disallow: /cgi-bin
According to the standard , You should be in every User-agent Insert a blank line before the instruction .# The character specifies a comment . Everything after this character , Until the first line break , Will be ignored .
robots.txt Disallow And Allow Instructions
Disallow Instructions , Use this instruction to prohibit indexing site sections or individual pages . Example :
Pages containing confidential data .
Pages with site search results .
Website traffic statistics .
Repeat the page .
All kinds of logs .
Database service page .
Here is Disallow Examples of instructions :
User-agent: Yandex
Disallow: / # It is forbidden to crawl the entire website
User-agent: Yandex
Disallow: /catalogue # Do not grab to /catalogue Opening page .
User-agent: Yandex
Disallow: /page? # It is forbidden to grab URL The page of
robots.txt Allow Instructions
This directive allows indexing of site sections or individual pages . Here is an example :
User-agent: Yandex
Allow: /cgi-bin
Disallow: /
# Prohibit indexing any page , Except for '/cgi-bin' The opening page
User-agent: Yandex
Allow: /file.xml
# Allow Indexing file.xml file
robots.txt Combined instructions
In the corresponding user agent block Allow and Disallow The order will be based on URL Prefix length ( From shortest to longest ) Sort , And apply it in order . If there are several instructions that match a specific website page , The robot will choose the last instruction in the sorting list . such ,robots.txt The order of instructions in the file will not affect the way the robot uses them .
# robots.txt File example :
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog
User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# Prohibit indexing with '/catalog' Opening page
# But you can index it with '/catalog/auto' The beginning page address
summary
So that's about Yandex Reptiles for robots.txt Some rules of writing , You can specify the configuration , Allow or prohibit Yandex Reptiles Crawl or disable crawl pages .
Reference material
边栏推荐
- 【CVPR2022 oral】Balanced Multimodal Learning via On-the-fly Gradient Modulation
- Google browser cannot open localhost:3000. If you open localhost, you will jump to the test address
- SMD component size metric English system corresponding description
- Mysql基本查询
- Melt cloud x chat, create a "stress free social" habitat with sound
- PC side bug record
- 【ARXIV2203】Efficient Long-Range Attention Network for Image Super-resolution
- MySQL(5)
- Introduction to testcafe
- From the basic concept of micro services to core components - explain and analyze through an example
猜你喜欢

Activation functions sigmoid, tanh, relu in convolutional neural networks

测试开发---自动化测试中的UI测试

The research group passed the thesis defense successfully

【CVPR2022 oral】Balanced Multimodal Learning via On-the-fly Gradient Modulation

11.< tag-动态规划和子序列, 子数组>lt.115. 不同的子序列 + lt. 583. 两个字符串的删除操作 dbc

【计算机三级信息安全】信息安全保障概述

FreeRTOS learning (I)

Summary and review of puppeter

FreeRTOS个人笔记-任务通知

ES6 new variable modifiers let and const, new basic data type symbol
随机推荐
【ARXIV2203】Efficient Long-Range Attention Network for Image Super-resolution
【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
Applet import project
Autoreleasepool problem summary
Share several methods of managing flag bits in C program
flink思维导图
How does Alibaba use DDD to split microservices?
jsonp 单点登录 权限检验
Driving the powerful functions of EVM and xcm, how subwallet enables Boca and moonbeam
Visual studio 2019 new OpenGL project does not need to reconfigure the environment
这种动态规划你见过吗——状态机动态规划之股票问题(中)
Reading sdwebimage source code Notes
How should programmers keep warm when winter is coming
【SLAM】LVI-SAM解析——综述
【计算机三级信息安全】信息安全保障概述
RT based_ Distributed wireless temperature monitoring system of thread (I)
Testcafe provides automatic waiting mechanism and live operation mode
MySQL(5)
11.< tag-动态规划和子序列, 子数组>lt.115. 不同的子序列 + lt. 583. 两个字符串的删除操作 dbc
Clickhouse填坑记2:Join条件不支持大于、小于等非等式判断