Is yahoo slurp misbehaving?

Complaints
Webmasters complain that yahoo slurp is not tripping the filter given in robots.txt for the following

User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/

User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/

User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/

Yahoo slurp obeys the agent specific rule and hence does not crawl the directories /bloop/ and /blop/directories where as it crawled the directories in the generic rule.

Discussions and Suggestions

Webmasters figured out that not only yahoo but even other search engines behaved in the same manner.

All the SE BOTS and slurps do not obey the Generic rule.

Suggestion 1: Put the wildcard ones first that has generic rule to less specific bots and then place the agent specific code.

Suggestion 2: main theme behind this is that the major bots get to their corresponding user agents and the rest carry on with wild card user agents.

Conclusion

All bots do trip and filter the specifications. They are tripped by the server configuration.
The fact is that major bots and slurps get to their respective user agents and the rest continue with the generic rule.

The exact rule for the above case would be

User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

This can be shortened to

User-agent: Slurp
User-agent: msnbot
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/

Details on sites reviewed – Pubcon 2006

Various sites were reviewed by the site review panel of 6 members along with Matt cutts. Each and every site was thoroughly analyzed by the site review panel. Then provided with explanations on the issues (details on the drawbacks of the site, where the sites are lagging behind and the areas neglected by the site owner) and suggestions on how to fix the issues and improve the site performance and ranking. The details of each site review and suggestions given are as follows.

1. The promotional gift company

The promotional gift company site had 20+ other sites running simultaneously with the same content overlapping one site content and pages to each other resulting in the confusion of various sites having same content for few words selected for discussion.

Suggestion: Include user feeds back such as comments, suggestions and forums rather than adding extra sentences, jumbling the existing words or changing the presentation to avoid duplicate content detection.

2. The Dollar Stretcher site

Issue: Doing well in Google but is banned in Yahoo

Reason: The site maps had hundreds of good articles all linked in the same page as the site was started in 1996. Next the site showed case mismatch in title displaying upper case URL titles to the user and lower case to Yahoo crawler and was banned in the suspect for cloaking. But the site had nothing to do with cloaking it was an issue of the site explorer.

Suggestion: Break up the site map chronologically, alphabetically or by topic and link to the site.

3. Real estate site

Real estate site had hundreds of pages where all the pages were in the format of about us or contact us.

Suggestion: Include lot of articles survey’s and construction happening of the state, province or locality.

4. Chiropractor site

The Chiropractor site was good but had no pages with keywords. They wanted to rank for a particular keyword which did not appear any where in the home page of the site

Suggestion: Think of the keywords that the user would use and use the key phrases appropriately in the home page for which you want to rank. Rewrite each and every page with suitable key phrases that you wish to rank.

5. Real Estate licensing

Real Estate licensing site had 100’s of varied topic domains under one person.

Comment: Running numerous sites doesn’t get the attention got from the person running single site and hence doesn’t show well.

6. Computer peripheral site

Issue: The computer peripheral site, a real good site with good content and unique back links but had long URL’s of about 14 to 15 words of description along with the session id.

Suggestion: The URL’s was asked to be shortened with 4 to 5 words and drop the session id if they possibly could.

Issue: The site had various categories whose URL’s were “/s-subcat-NETWORK~.html”.

Suggestion: Better if the category URL’s were in the folder format as /network/…

7. The HiFi store also had 40+ varied sites as the real estate licensing site. As discussed earlier it will not obtain the attention that is required to do well in search engines.

8. Spa site

Issue: The Spa site had problems accessing the pages of the site hence she loaded her home page with loads of content. Even though it had problems the site was doing well in Google and Yahoo as it was a real good one. She specified that she was in search of a good seo for a long time.

Lot of people willingly agreed to do seo for this site for free and the site owner choose the right convention required for her site.

The topic that evolved through out the discussion was link exchange and its impact. There were lot of sites which had done excessive reciprocal link exchange which was of no use.

Suggestion: Instead of wasting lot of time in exchanging links one could concentrate on making the site more attractive and well marketed that would truly help in the site betterment.

Do virtually hosted sites get banned by Google?

There had been discussions on virtual hosting versus dedicated IP addresses that multiple sites having same IP addresses will have its page rank affected and Google penalize sites which are virtually hosted.

It’s a myth if people say that Google penalize sites which are virtually hosted and is a false intuition that multiple sites page ranks are affected if they are hosted on the same IP address.

Virtually hosted sites and domains with dedicated IP address are handled equally by Google.

The statement confirmed in mattcutts blog (http://www.mattcutts.com/blog/) indicating a conversion slashdotted.

“Google handles virtually hosted domains and their links just the same as domains on unique IP addresses. If your ISP does virtual hosting correctly, you’ll never see a difference between the two cases.”

A joint venture by search engines for sitemap submission

Google, Yahoo and msn together provide one sitemap protocol (http://www.sitemaps.org/) where you could submit your site pages to major search engines jointly in one submission.

MSN search and win – Is it true.


When i searched for a keyword and as MSN search and win page said the result said i won some prize

when I clicked on that we got the following URL: http://www.msnsearchandwin.com/_v6/9326-059-713/g=92@13/98775785

it said sorry search again to win, so what was the problem,

Got the above page

We love Big Daddy Update


We love Big Daddy,
Google’s new datacenter results are spectacular completely free from spam and we are pleased that our ethical methods got good ranking for all our client sites.
We LOVE BIG DADDY

Toolbar Pagerank updated 19th February 2006 – Google pagerank update

Google has updated its toolbar pagerank, only couple of google datacenters are showing new Pagerank as of now,

you can compare your pagerank at our future pagerank checker

some DCs where you might see new pagerank.

216.239.53.104
216.239.57.99

Matt cutts on webspam hunt, Cartoon on matt’s spam hunt spree

We have a simple cartoon on matt’s activities, nothing personal here and this
cartoon is just for joke, Matt’s spam hunt made us create this cartoon hope
you all enjoy,

Matt Cutts’s is on a search engine spam hunting spree in recent days, From
Day 1 matt’s blog has been a great hit, His contribution to the seo community
is tremendous. He guides a lot of web masters in the right way, Being the head
of web spam team in google matt’s responsibilities are huge. He has to keep
google’s massive search index clean from Corporate multi language spam , hidden
text, cloaking, Sneaky javascript redirects, unethical seo companies, keyword
stuffing etc,

Recently matt cutts announced
banning of BMW.de web site which was using sneaky javascript redirects to show
search engines and different pages and when users move their mouse over it the
page redirects to an other page,

Matt was bold enough announce
the ban of traffic-power.com from their search engine index, Traffic power’s
lawsuit against Trafficpowersucks and aaron wall – seobook was busted by this
important announcement.

His frequent
updates
on inner working on google is the most important part of all for
seo community, Big daddy update one of the most important update of all time
for google was first announced officially by matt cutts. Later it was picked
up by top news sites like BBC, Newstoday etc.

The word link
baiting
was rarely used before matt gave it a kick start, Now that word
has become very important and every seo company has included it as part of their
strategy in search engine optimization.

Matt also had a hour long discussion in webmasterradio
where he discussed about paid links, big daddy update etc.Matt’s effort to get
feed back on google’s services in great both for webmasters as well as the improvement
of their own search engine. Matt asked for feedback on webmaster
related issues
, webspam,
miscellaneous services,
good
will communications
, search
quality
etc.

Please DO ADD COMMENTS

SEO BLOG TEAM,

Interesting articles on web spam and search engine spam,

Everyone will be interested to know how search engines combat spam, here are 2 papers from stanford which explains perfectly how search engines combat spam,

http://www.searchenginegenie.com/spam.pdf

http://www.searchenginegenie.com/spam1.pdf

One article says everything done to improve ranking is spam, we disagree with it,

Article says

What is web spam?
Spamming = any deliberate action
solely in order to boost a web page’s
position in search engine results,
incommensurate with page’s real value
Spam = web pages that are the result of
spamming
This is a very broad definition
SEO industry might disagree!
SEO = search engine optimization
Approximately 10-15% of web pages
are spam

Matt cutts confirms penalty to traffic power seo company

Matt cutts of google has unofficially confirmed that traffic-power.com were banned from google for using unethical seo tactics to rank their client sites, there has been a lawsuit going on against aaron wall of seobook and trafficpowersucks.com for using words against traffic power

Now their complains has been confirmed and matt cutts lead of web spam team confirmed in his blog http://www.mattcutts.com/blog/confirming-a-penalty/ that traffic power were indeed banned from search engines,

Some previous articles on traffic power ( here ) and here.

Request a Free SEO Quote