ID #1074

Is yahoo slurp misbehaving?

Complaints
Webmasters complain that yahoo slurp is not tripping the filter given in robots.txt for the following

User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/

User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/

User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/

Yahoo slurp obeys the agent specific rule and hence does not crawl the directories /bloop/ and /blop/directories where as it crawled the directories in the generic rule.


Discussions and Suggestions

Webmasters figured out that not only yahoo but even other search engines behaved in the same manner.

All the SE BOTS and slurps do not obey the Generic rule.

Suggestion 1: Put the wildcard ones first that has generic rule to less specific bots and then place the agent specific code.

Suggestion 2: main theme behind this is that the major bots get to their corresponding user agents and the rest carry on with wild card user agents.

Conclusion

All bots do trip and filter the specifications. They are tripped by the server configuration.
The fact is that major bots and slurps get to their respective user agents and the rest continue with the generic rule.

The exact rule for the above case would be

User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

This can be shortened to

User-agent: Slurp
User-agent: msnbot
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

Tags: -

Related entries:

You cannot comment on this entry