What is It?

Search engine robots will check a special plain text file in the root of each server called robots.txt before indexing a site. Robots.txt implements the Robots Exclusion Protocol, which allows you as a web manager, to define what parts of your site are off-limits to search engine crawlers. For example, Web managers can disallow access to the Common Gateway Interface (CGI), or private and temporary directories, because they don't want pages in those areas indexed.

Here is some general information about robots.txt files.

Robots.txt File

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website. The robots.txt file is made up of two parts, the User-agent and the Disallow. The User-agent specifies which robots to allow or disallow and the Disallow specifies which directories robots can or cannot crawl. The robots.txt is a gentleman's agreement and some crawlers, such as Google, may ignore the robots.txt file that disallows all crawling.

posted by sarah @ 6:20 PM permanent link   |

Post a Comment

|


0 Comments:

<< Home