Googlebot, Yahoo Slurp, and MSNbot and such other spiders, bots and
crawlers, shorn of their technical names, are merely programs that
accumulate information for search engines.
These search engines bots, particularly Google, Yahoo and MSN bots,
gather key information about your page for use in their respective
search engines. These spiders are not wholly undesirable - it is because
of these spiders, you are indexed more often to be displayed in the
search engine results pages.
Let us understand that a spider only a computer program that follows
certain links on the web picking up information as it goes. Because
these crawlers are merely computer programs, they are not very intelligent
and therefore get caught in endless loops - more so, in dynamic web
pages.
Of course, there are times when you wish to avoid certain pages or
images indexed and this is possible with what is called robots.txt
file which is a document that instructs spiders what they may or may
not index.
Should you choose to minimize the impact these spiders have, you can
instruct them not to follow one specific link by inserting in the
anchor tag: rel="nofollow". This will lessen the outgoing number of
links and help you maintain your page rank.
Anyway, things are not always that simple. There are also bad bots
that disobey your robots.txt and are unmindfully harvest your email
address. To overcome these bad bots, some people use java script to
"hide" their email addresses.
There was a time when predicting the behavior of common search engine
spiders was not difficult. But, in today's changed scenario, predicting
search spiders is not easy with a rapidly growing number of spiders
and search databases galore to reckon with.
Till recently, GoogleBot was the only meaningful search spider in
operation and Google fed search results to most of its competitors.
A period of couple of years is a long time for search engines to improve
and develop. Today, there are four major general search engines and
numerous vertical search tools, each with its unique algorithm and
spider schedule.
Today's spiders have become extremely intelligent and there is no
saying when and where a spider will crawl. Most spiders identify and
visit an active website very frequently. It is found that spiders
from Ask Jeeves visit at least once a day while MSN and Yahoo spiders
visit the index page several times a day. Google only visits the index
page, roughly twice a week.
A lot of developments have taken place and search engine spiders are
now able to contextualize content within a domain and schedule visits
accordingly. Although the timing and frequency of spider visits have
changed radically, the behavior of the spiders remain the same.
It is found that of all the spiders, the most active is MSNBot. Visiting
each document in its index page daily, MSNBot are sometimes unaware
when to quit. Next to MSNBot, Ask Jeeves and Yahoo appear to be the
most active of the major bots. Strangely, the least active is GoogleBot,
which visits each document in site roughly once a month and does not
follow any set pattern.
Now, a way has been found to navigate spiders through the site by
creating a basic, text- based sitemap fixed to the back of your website.
The sitemap must necessarily list every document in your website.
Add a link to the sitemap as footer in each page. For Google, you
must create a XML based sitemap.
A fortnight after implementing the HTML sitemap and uploading your
XML sitemap to Google, observe where the spiders are visiting and
which documents receive the maximum visits. You will realize spiders
can be very friendly and helpful.
If you are willing to use Search Engine Genie's Services
Contact
Us Or Mailto search
engine genie support desk
|