当前位置:网站首页>yandex robots txt
yandex robots txt
2022-07-28 05:23:00 【oHuangBing】
robots.txt It is a text file containing website index parameters , For search engine robots .
Yandex Support for... With advanced functions Robots Exclusion agreement .
When crawling a website ,Yandex The robot will load robots.txt file . If the latest request for this file shows a website page or part is prohibited , Robots will not index them .
Yandex robots.txt Document requirements
Yandex Robots can handle robots.txt, However, the following requirements should be met :
The file size does not exceed 500KB.
It is named "robots " Of TXT file , robots.txt.
The file is located in the root directory of the website .
This file can be used by robots : The server hosting the website is HTTP Code response , Status as 200 OK. Check the response of the server
If the document does not meet the requirements , The website is considered to be open indexed , That is to say Yandex Search engines can access web content at will .
Yandex Support from robots.txt Redirect files to files located on another website . under these circumstances , The instructions in the object file are taken into account . This kind of redirection may be very useful when moving websites .
Yandex visit robots.txt Some rules of
stay robots.txt In file , The robot will check to User-agent: The first record , And look for characters Yandex( Case is not important ) or *. If User-agent: Yandex String detected ,User-agent: * String will be ignored . If User-agent: Yandex and User-agent: * String not found , Robots will be considered to have unlimited access .
You can have the Yandex The robot inputs separate instructions .
For example, the following examples :
User-agent: YandexBot # Writing method for index crawler
Disallow: /*id=
User-agent: Yandex # Will be for all YandexBot work
Disallow: /*sid= # In addition to the main indexing robots
User-agent: * # Yes YandexBot It won't work
Disallow: /cgi-bin
According to the standard , You should be in every User-agent Insert a blank line before the instruction .# The character specifies a comment . Everything after this character , Until the first line break , Will be ignored .
robots.txt Disallow And Allow Instructions
Disallow Instructions , Use this instruction to prohibit indexing site sections or individual pages . Example :
Pages containing confidential data .
Pages with site search results .
Website traffic statistics .
Repeat the page .
All kinds of logs .
Database service page .
Here is Disallow Examples of instructions :
User-agent: Yandex
Disallow: / # It is forbidden to crawl the entire website
User-agent: Yandex
Disallow: /catalogue # Do not grab to /catalogue Opening page .
User-agent: Yandex
Disallow: /page? # It is forbidden to grab URL The page of
robots.txt Allow Instructions
This directive allows indexing of site sections or individual pages . Here is an example :
User-agent: Yandex
Allow: /cgi-bin
Disallow: /
# Prohibit indexing any page , Except for '/cgi-bin' The opening page
User-agent: Yandex
Allow: /file.xml
# Allow Indexing file.xml file
robots.txt Combined instructions
In the corresponding user agent block Allow and Disallow The order will be based on URL Prefix length ( From shortest to longest ) Sort , And apply it in order . If there are several instructions that match a specific website page , The robot will choose the last instruction in the sorting list . such ,robots.txt The order of instructions in the file will not affect the way the robot uses them .
# robots.txt File example :
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog
User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# Prohibit indexing with '/catalog' Opening page
# But you can index it with '/catalog/auto' The beginning page address
summary
So that's about Yandex Reptiles for robots.txt Some rules of writing , You can specify the configuration , Allow or prohibit Yandex Reptiles Crawl or disable crawl pages .
Reference material
边栏推荐
- Using RAC to realize the sending logic of verification code
- Google browser cannot open localhost:3000. If you open localhost, you will jump to the test address
- How to analyze fans' interests?
- 凛冬已至,程序员该怎么取暖
- 【CVPR2022】Lite Vision Transformer with Enhanced Self-Attention
- FreeRTOS startup process, coding style and debugging method
- MySQL(5)
- 【ARXIV2203】Efficient Long-Range Attention Network for Image Super-resolution
- RT_ Use of thread message queue
- Table image extraction based on traditional intersection method and Tesseract OCR
猜你喜欢

Offline loading of wkwebview and problems encountered

【ARIXV2204】Neighborhood attention transformer

Activation functions sigmoid, tanh, relu in convolutional neural networks

Introduction to testcafe

在ruoyi生成的对应数据库的代码 之后我该怎么做才能做出下边图片的样子

数据库日期类型全部为0

Configuration experiment of building virtual private network based on MPLS

Transformer -- Analysis and application of attention model

HashSet add

Bean的作用域、执行流程、生命周期
随机推荐
HDU 1522 marriage is stable
Visual studio 2019 new OpenGL project does not need to reconfigure the environment
11.< tag-动态规划和子序列, 子数组>lt.115. 不同的子序列 + lt. 583. 两个字符串的删除操作 dbc
php7.1 连接sqlserver2008r2 如何测试成功
7. < tag string and API trade-offs> supplement: Sword finger offer 05. replace spaces
Gan: generative advantageous nets -- paper analysis and the mathematical concepts behind it
MySQL date and time function, varchar and date are mutually converted
【内功心法】——函数栈帧的创建和销毁(C实现)
Driving the powerful functions of EVM and xcm, how subwallet enables Boca and moonbeam
Flink mind map
Clickhouse pit filling note 2: the join condition does not support non equal judgments such as greater than and less than
Making RPM packages with nfpm
Handling of web page image loading errors
How to analyze fans' interests?
Classes and objects [medium]
HDU 1530 maximum clique
Interpretation of afnetworking4.0 request principle
regular expression
Class class added in ES6
Data security is gradually implemented, and we must pay close attention to the source of leakage