当前位置:网站首页>How to create robots Txt file?
How to create robots Txt file?
2022-06-29 04:16:00 【IDC industry watcher】
If you use Wix or Blogger And other website hosting services , May not be required ( Or not ) Directly modifying robots.txt file . Your hosted service provider may display the search settings page or borrow some other way , Let you tell search engines whether they should crawl your web pages .
If you want to hide from search engines / Unhide one of your pages , Please search the following description : How to modify the visibility of web pages in search engines on managed services , Search for example “Wix Hide web pages from search engines ”.
You can use robots.txt Which files on your website can be accessed by the file control crawler .robots.txt The file should be located in the root directory of the web site . therefore , For websites www.example.com,robots.txt The path to the file should be www.example.com/robots.txt.robots.txt Is a plain text file that follows the walkers exclusion criteria , Consisting of one or more rules . Each rule can prohibit or allow specific crawling tools to crawl the files under the specified file path of the corresponding website . Unless you are in robots.txt Otherwise specified in the document , Otherwise, all files are implicitly allowed to be fetched .

Here is a simple with two rules robots.txt file :
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml
Here's what to do robots.txt The meaning of the document :
be known as Googlebot The user agent for cannot crawl anything to http://example.com/nogooglebot/ The opening web address .
All other user agents can crawl the entire site . It doesn't hurt not to specify this rule , The result is the same ; The default behavior is that the user agent can crawl the entire web site .
The site map file path of the website is http://www.example.com/sitemap.xml.
For more examples , See the grammar section .

establish robots.txt The basic principles of the document
To create a robots.txt File and make it accessible and practical in general , Need to complete 4 A step :
Create a file called robots.txt The file of .
towards robots.txt File add rule .
take robots.txt File upload to your website .
test robots.txt file .
establish robots.txt file
You can use almost any text editor to create robots.txt file . for example ,Notepad、TextEdit、vi and emacs Can be used to create valid robots.txt file . Do not use word processing software , Because such software usually saves files in proprietary formats , Unexpected characters may be added to the file ( Such as curved quotation marks ), This may cause problems for the grab tool . If the corresponding system prompt appears when saving the file , Please be sure to use UTF-8 Encoding saves the file .
Format and location rules :
The file must be named robots.txt.
The website can only have 1 individual robots.txt file .
robots.txt The file must be located in the root directory of the web site host to which it is applied . for example , To control for https://www.example.com/ Crawl all URLs , It must will be robots.txt Files in https://www.example.com/robots.txt Next , It must not be placed in a subdirectory ( for example https://example.com/pages/robots.txt Next ). If you are not sure how to access your website root , Or you need permission to access , Please contact your web hosting service provider . If you can't access the website root , Please use another shielding method ( For example, meta tags ).

robots.txt Files can be applied to subdomains ( for example https://website.example.com/robots.txt) Or non-standard port ( for example http://example.com:8181/robots.txt).
robots.txt The document must be in UTF-8 code ( Include ASCII) Text file for .Google May ignore not belonging to UTF-8 Range of characters , This may lead to robots.txt Invalid rule .
towards robots.txt File add rule
A rule is a description of what parts of a website the crawler can crawl . towards robots.txt When adding a rule to a file , Follow these guidelines :
robots.txt The file contains one or more groups .
Each group consists of multiple rules or instructions ( command ) form , Each instruction occupies one line . Each group is represented by User-agent The beginning of the line , This line specifies the applicable targets for the Group .
Each group contains the following information :
Applicable objects of the group ( The user agent )
Directories or files that agents can access .
Directories or files that the agent cannot access .
The crawler processes the groups from top to bottom . A user agent can only match 1 Rule sets ( That is, the first and most specific group matched with the corresponding user agent ).
The default assumption of the system is : The user agent can crawl all the data that has not been disallow Rule blocked pages or directories .
Rules are case sensitive . for example ,disallow: /file.asp Apply to https://www.example.com/file.asp, But not for https://www.example.com/FILE.asp.
# The character indicates the beginning of the comment
come from https://cn.bluehost.com/blog/
边栏推荐
猜你喜欢

Remote connection of raspberry pie in VNC Viewer Mode

1018 锤子剪刀布

Blue Bridge Cup ruler method

自己动手搭建一个简单的网站

Zhai Jia: from technical engineer to open source entrepreneur of "net red"

What is the dry goods microservice architecture? What are the advantages and disadvantages?

多机局域网办公神器 rustdesk 使用强推!!!

PostgreSQL has a cross database references are not implemented bug

为什么说测试岗位是巨坑?8年测试人告诉你千万别上当

直播预约|AWS Data Everywhere 系列活动
随机推荐
How to keep database and cache consistent
Anaconda's own Spyder editor starts with an error
sink端 一般都是jdbc的insert update delete么?
PATH 与 LD_LIBRARY_PATH 的用法举例
开发者方案 · 环境监测设备(小熊派物联网开发板)接入涂鸦IoT开发平台
Emotional changes need to be controlled
The second meeting of the Second Council of Euler open source community was held, and Xinhua III, hyperfusion and Godson Zhongke became members of the Council
要不是和阿里P7聊过,我也不知道自己是个棒槌
Ask a simple question about SQL
Zhai Jia: from technical engineer to open source entrepreneur of "net red"
Redis 缓存穿透、缓存击穿、缓存雪崩
Call snapstateon closed sou from Oracle CDC
Using assetstudio/unitystudio uabe, etc
1017 A除以B分
Idea modifying JVM memory
树莓派用VNC Viewer方式远程连接
PostgreSQL has a cross database references are not implemented bug
Implementation of thread pool based on variable parameter template
MySQL复习资料(附加)case when
从零到一,教你搭建「以文搜图」搜索服务(一)