Follow Sea rch Engine Genie on Twitter Follow Search Engine Genie on Facebook
Click to view New Search Engine Genie Plans Contact search engine genie
SEO          Blogs          100 Web Tools          Services          Link Building          Web Design          Articles          Contact Us         
Search Engine Optimization  Search Engine Optimisation uk  Search Engine Optimization canada   Search Engine Optimization India
   
  

 

Add Pagerank Display on your website for Free Click here to get Code: pagerank search engine optimization

Solution for SEO Problems
Search Engine Genie Clients
Request SEO Quote

SEO FAQ

All About Robots.txt?

Robots.txt file is a simple text file with a few lines of text and it can decide whether the website should be shown on Google or not and what part of your website should be shown to the search engines like Google, Yahoo and MSN.

For example:

Allow all spiders to index everything

Disallow:
OR
Leave the Robots.txt blank without any commands.

 

Allow no spiders to index any part of your site

User-agent: *
Disallow: /
This ensures that no spider would index anything at all on your site.

 

To better understand what robots.txt files are, one must first understand what a Web Robot is. A robot may be defined as a program from search engines that are set out on the internet by search engines like Google, Yahoo, MSN, Altavista, Ask.com and others to find out new websites, indexing them and gathering the right information about the website. Robots are sometime called as spiders, crawlers and bots.

Robots.txt, simply stated, is a text file on a site to inform search robots which pages they should not visit. By defining a few rules in this little text file, you can instruct robots to not crawl and index certain files, directories within your site. Generally, search engines abstain from things they are asked not to do.

Importance of Robots.txt file to the webmaster:

From a webmaster’s point of view also, Robots.txt should be considered important because it ensures better indexing of the websites – resulting in more information passed to search engines to help gain better ranks. It becomes possible for the webmaster to decide how their websites should be crawled, indexed and ranked by the search engines by the use of well-written Robots.txt files. The function of the Robots.txt file is to give commands to the visiting robots, help them index and collect relevant information about the website. It is to be noted that the commands on the robots.txt file are completely configurable by the webmaster.

Functions of Robots

  • Site Indexing – This is similar to taking a copy of a new website it identifies and storing it somewhere in the search engines servers.

  • Validating the site code – This is like comparing the website code to W3C standards and grading them in keeping with its accuracy.

  • Link Checks - Tracing all possible links – both incoming and outgoing - from indexed websites, and calculating the sites grading factors.

  • Some advanced engine robots supposedly do more complex tasks like categorizing websites, analyzing their search engine metrics, popularity ratio etc.

At the same time, it is important to know that robots.txt is not a foolproof way to bar search engines from crawling your site - as it is not a firewall, or any kind of password protection.

It is welcome when search engines frequently visit your site and index your content but at times they may index parts of your online content that you do not want it to. In other words, robots.txt is a text file which can be used to tell web robots to access your web site only in areas you approve.

But professionals opine that if you have really sensitive data, it is best not to overly depend on robots.txt to protect it from being indexed or displayed in search results. Further, if you are keen to save some bandwidth by excluding images, style sheets and java script from indexing, you have to necessarily inform spiders to keep away from these items.

Spotting the Robots.txt file:

As a matter of fact, one common way to instruct search engines which files and folders on your Web site they should avoid visiting is by the use of the Robots metatags. But the problem is not all search engines are capable of reading metatags, and hence the robots.txt file.

It is necessary to locate robots.txt in the main directory or else search engines will not be able to find it. Please note that it is not the job of the search engine to search the whole site for robots.txt. At best they may look for it in the main directory and if it is not there, they will simply conclude that the site does not have a robots.txt file.

First, create a regular robots.txt file and make sure it is named exactly that. It is also important that this file is uploaded to the root accessible directory of your site, not to a subdirectory. These two steps are necessary for search engines to understand the instructions contained in the file

How to set up a Robots.txt file?

The next obvious question is - how to set up a Robots.txt file? It is better to study the basics well before setting up a Robots.txt file.

  • Open a new text document on your machine.

  • In it, type these text, accurately - User-agent: * Disallow:

  • Save it as "Robots.txt"

  • Go to server by accessing the file manager or the FTP, and go to the root folder.

  • Upload the "Robots.txt" file to the root folder.

The Robots.txt file is now set up successfully. But please know commands have been issued to allow all search engine robots to crawl the entire site without restriction. If you wish to selectively disallow/block certain files/folders to be crawled, then you will have to follow the commands shown below:

Exclude a file from an individual search engine

User-agent:Google
Disallow: /thepathtoyourfile.html
Replace "Google" with your search engine preference and replace "thepathtoyourfile.html" with the actual path to your file. If you would like to block more than one file, you have to repeat this command (second line) with specific file names.
Ex: Disallow: /file1.html
Disallow: /file2.html

 

Exclude a section of your site from all spiders and bots

User-agent: *
Disallow: /1/2/dir-to-be-blocked/
Replace "dir-to-be-blocked" with the actual path to your directory that is to be blocked.

 

For more robots commands just Click Here

 

NOTE: Some crawlers now support an additional field called ‘Allow:’ - particularly Google. As is evident, ‘Allow:’ lets you specifically dictate what files/folders can be crawled. But there is a word of caution. This field is currently not part of the "robots.txt" protocol and as such, better to use it only if absolutely needed, as it is likely to confuse some less intelligent crawlers.

As concluding remarks, it may be stated that Robots.txt is an extremely useful tool to control the way search engines can scan the website and gather information from them. As a matter of fact, the more complex and careful one plans the web design, the better the search engine positions would be. If you think not showing a folder content to search engines will avoid unnecessary information being passed to it, then you might as well use the Robots.txt file.

Never forget the fact that what you are doing is social bookmarking. Being amiable and adopting a friendly approach can help create a large network of people who belong to your niche. Encourage them to comment on the links that you post and interact by promptly replying to their comments. Seize this opportunity to really get to know people and use the information you gather to improve your online strategy.

Links to More Information and Resources

robotstxt - The Web Robots Pages

 

For more Search Engine Genie Articles

You may contact us for further details by clicking here or e-mail us at - support@searchenginegenie.com

 
 

SEO

Search engine optimization
SEO For Auto Transporters
Search engine marketing
SEO consulting
SEO plans
SEO Plans Comparision
SEO Problems
SEO services USA
Search engine optimization SEO forum
SEO comics
Webmaster & Search Events
SEO Faq
SEO Resellers
Search Engine Glossary
Search Engine Optimisation UK
Search Engine Optimization Canada
SEO Company India
Charlotte SEO
SEO Newsletter archives
Search Engine Expert Directory
SEO Knowledge Base

Link Building

link popularity
strategies of link building
link building services
link cost
Advanced Link Building Strategies
link request quote
link building blog

Our Company

Contact us
About Us
Client Testimonials
Support
Our guarantee
Search engine genie company
Our team
Our celebrations
Our experience
SEO
Why us
Our Clients
Case Studies
Leave Feedback
Career Opportunity
Contempo technologies pvt ltd
Contempo technologies pvt ltd mission

Web Design

web designing services
dynamic website
web design and marketing
simple e-commerce website
complex e-commerce website
search engine friendly site
web design blog
web design portfolio

Services

Web design
Link building
Internet marketing
Ecommerce implementation
Pay per click services
Shopping feeds optimization
Shopping cart customization
Product development
Online forms & database integration
Programming services
PHP programming services
Programming services Java,J2EE
.NET application development programming services
Business process outsourcing
Offshore outsourcing
Google Products(froogle feeds)

Articles

Articles
SEO Articles
Search engine optimization Articles
Google Articles
Yahoo Articles
Pay Per Click Articles
Miscellaneous articles
Search engine optimization SEO blog
Search engine optimization SEO news
SEO copywriting blog
Web design blog
Link building blog
Pay per click (PPC) blog
Programming blog
Lara personnal blog
Search Engine Genie Blog

100 Web Tools

Google Tools
Widget
Yahoo Tools
MSN Tools
Text Tools
Comparison Tools
Link Popularity Tools
Search Engines Tools
Site Tools
Keyword Tools
Javascript Tools
Miscellaneous Tools



Sitemap

SEO Sitemap
Search engine optimization forum sitemap1
Search engine optimization forum sitemap2
Search engine optimization forum sitemap3
SEO blog sitemap
SEO News sitemap
SEO copywriting blog sitemap
Web design blog sitemap
Link building blog sitemap
Pay per click blog sitemap
Programming blog sitemap