search engines

Percentage of share for top 500 websites over other sites in internet

I see some people discuss on the market share between the top 500 websites in internet with all other remaining millions of websites. So where do we get a credible data to understand this, Alexa has a list of top 500 websites. But if you see the history of alexa their results are easily skewable since their results come from millions of ALexa toolbar users around the world. How can toolbar data be accurate so who are all the millions of Alexa toolbar users do the represent the real internet users?
In my Opinion No, Alexa users are mostly site owners, webmasters, techies etc. Mostly its a webmaster baised traffic its never a reliable one to gauge the real traffic to a website. Though too much baised still Alexa is a good place to start. Some the sites that are listed in Alexa top 500 websites are the best in internet. So I wouldn’t complain too much on alexa.

If you do a Google search for http://www.google.com/search?hl=en&q=top+internet+websites
top websites you will come across a bunch of good lists which will give the top sites in internet. But its very difficult to understand and analyze what type of traffic a real site gets unless those sites care to share their log data which I feel is never possible

Search Engine Genie

Googlebot now digs deeper into your forms – Great new feature from Google smart guys

Google’s crawling team has made a major step forward a step which everyone thought the Search Engine crawler will never go to. According to the Official webmaster central Blog now Google has the capability to crawl through HTML forms and find information this is a huge step forward.
Remember forms had always been a user only feature when we see a form we tend to add a query and search for products or catalogs or other relevant information. For example if we go to a product site we just see a search form, Some product sites will just have a form to reach the products on their website. There will not be any other way to access the inner product pages which might have valuable information for the crawlers. Good product descriptions which might be unique and useful for users will be hidden from t he users. Similarly imagine a edu website I personally know a lot of Edu websites which don’t provide proper access to their huge inventory of research papers, PowerPoint presentations etc.

Only way to access those papers is through a search button in Stanford website, Look at this Query you can see at least 6000 useful articles about Google which are in Stanford site. If you scan through Standford website you will not find these useful information connected to the website anywhere. They are rendered directly from a database. Now due to the advanced capability of Google bot to crawl forms they can use queries like Google research etc in sites like Standford and crawl all the PDFs, PowerPoint’s and other features that are listed there. This is just amazing and a great valuable addition.

I am personally enjoying this addition by Google. When I go to some great websites they are no where never optimized most of their high quality product pages or research papers are hidden from regular crawlers. I always thought why don’t I just email them asking to include a search engine friendly site map or some pages which has a hierarchical structure to reach inner pages. Most of the sites don’t do this nor do they care that they don’t have it. At last Google has a way to crawl the Great Hidden web that is out there. When they role out this option I am sure it will be a huge hit in future and will add few billion pages more to the useful Google index.

Also the Webmaster central blog reports Google bot has the capability to toggle between Radio buttons, drop down menus, check box etc. Wow that is so cool wish I was part of the Google team who did this research it is so interesting to make a automated crawler do all this Magic on your website which has always been part of the user option.

Good thing I noticed is they mention they do this to a select few quality sites though there are some high quality information out there we can also find a lot of Junk. I am sure the sites they crawl using this feature are mostly hand picked or if its automated then its subject to vigorous quality / Authority Scoring.

Another thing that is news to me is the capability of Google bot to scan Javascript and flash to scan inner links. I am aware that Google bot can crawl flash but not sure how much they reached with Javascript. Before couple of years Search Engines stayed away from Javascript to make sure they don’t get caught in some sort of loop which might end up in crashing the server they are trying to crawl. Now its great to hear they are scanning and crawling links in Javascript and Flash without disturbing the well-being of the site in anyway.

Seeing the positive site we do have a negative side too, There are some people who don’t want their pages hidden inside forms to be crawled by Search Engines. For that ofcourse google crawling and indexing team has a solution. They obey robots.txt, nofollow, and noindex directives and I am sure if you don’t want your pages crawled you can block Googlebot from accessing your forms.

A simply syntax like

Useragent: Googlebot
Disallow: /search.asp
Disallow: /search.asp?
will stop your search forms if your Search form name is search.asp.

Also Googlebot crawls only get Method in forms and no Post Method in forms. This is very good since many Post method forms will have sensitive information to be entered by ther users. For example many sites ask for users email IDs, user name, passwords etc. Its great that Googlebot is designed to stay away from sensitive areas like this. If they start crawling all these forms and if there is a vulnerable form out there then hackers and password thieves will start using Google to find unprotected sites. Nice to know that Google is already aware of this and is staying away from sensitive areas.

I like this particular statement where they say none of the currently indexed pages will be affected thus not disturbing current Pagerank distribution:

“The web pages we discover in our enhanced crawl do not come at the expense of
regular web pages that are already part of the crawl, so this change doesn’t
reduce PageRank for your other pages. As such it should only increase the
exposure of your site in Google. This change also does not affect the crawling,
ranking, or selection of other web pages in any significant way.”

So what next from Google they are already reaching new heights with their Search algorithms, Indexing capabilities etc. I am sure for the next 25 years there wont be any stiff Competition for Google. I sincerely appreciate Jayant Madhavan and Alon Halevy, Crawling and Indexing Team for this wonderful news.

What is the next thing I expect Googlebot :

1. Currently I dont seem them crawl large PDFs in future I expect to see great crawling of the huge but useful PDFs out there. I would expect a cache to be provided by them for those PDFs.

2. Capability to crawl Zip or Rar files and find information in it. I know some great sites which provide down loadable research papers in .zip format. Probably search engines can read what is inside a zipped file and if its useful for users can provide a snippet and make it available in Search index.

3. Special capabilities to crawl through complicated DHTML menus and Flash menus. I am sure search engines are not anywhere near to doing that. I have seen plenty of sites using DHTML menus to access their inner pages, also there are plenty of sites who use Flash menus I am sure Google will overcome these hurdles , understand the DHTML and crawl the quality pages from these sites.

Good Luck to Google From – Search Engine Genie Team,

Conversion client report from Search Engine Genie

For about a year we had been monitoring conversion rates for our clients. For
some of the sites we did SEO it was a huge success. We had clients rank for some
very competitive keywords and some industries gave our clients an ROI in the range
of 1000 to 1500%.

Here is a small list of conversion rates from the sites we monitor:

Search Engine Optimization in the following
industries
Lead from click throughs

%

Conversion

Rate from clickthroughs from SERPs %

ROI from the investment client made with
us. ( Average per year )
Success rate in Organic rankings.
Automobile industry:
12%
8%
800%
98.7%
Jewellery:
8%
6%
1200%
94%

Real estate:

11%
4%
400%
85%
Vacation and Tours
12%
9%
600%
99%
Painting Industry
7%
2%
1400%
85%
Loans and settlements
12%
1.5%
700%
75%
Motorbikes
15%
3%
1800%
99%
Web Development
9%
7%
900%
90%

We will continue to monitor and will come up with a bigger list in our future
blog post,

Nofollow in image links and Image maps – Does search engines follow it?

I am sure everyone knows what are Nofollow tags by now, Nofollow tags are used in hyperlinks to make a search engine crawler know that the link which has it is not trustworthy and should not be given the necessary link credit. Its vastly used in forums, blogs, message boards etc to prevent spam, Comment spam is a very popular spam that is spoiling many quality blogs. To stop this search engines combindely released a tag called rel=Nofollow to let the crawlers not to give any credit for the link.

Ok so we know about rel=Nofollow in text links are followed I don’t think many of you will know that search engines can crawl image maps and links from images very easily and any extra tagging in Nofollow tag is followed.

Search Engines have no trouble following image map links and links from on images. Any Nofollow tags from image maps is followed without hindrance.

Here is an example:

for example:

product

Search Engine Friendly headers – Headers that help search engine rankings.

There are very few headers that we call search engine friendly.

1) 2oo OK: This is the primary headers that needs to be for all working sites. Search engines will understand 200 ok to be ideal for crawling and indexing a site.

2. ) 301 Permanent redirect: This is again a search engine friendly header return, 301 redirect is used where you want to permanently move a particular page to an other page or an other URL. This is search engine friendly since your redirect won’t affect any search engine crawlers since they will understand that your page has moved to a new location.

3. 503 status code: 503 is best to prevent search engine crawlers from crawling your website If your site is hacked or attached by a virus best is to prevent the search engines from crawling , Google’s webmaster central blog has some useful information on that here.

Search Engine Genie

Search engines love fresh contents – New contents a gift for search engines.

Search Engines love new contents being added to site everyday. We hav tested this on about 25 different sites. Contents where the pages are updated regularly are crawled on a regular basis. Pages like the homepage which rarely changes gets indexed probably once in a week or at times once in a month.
I recommend keep updating the site regularly add a blog / article or news section and constantly update this area with fresh information so that search engines when ever they come to your site finds fresh information. To get the best results from the updated information make sure you write it unique, content syndication from other blogs or news sites dont work to the best. Unique information is a hit with search engines they love to see great information from your website that is fresh for their users. Our blog postings get crawled in about 2 hours time not more than that. This is because of quality fresh information we provide for the crawlers.

SEO Blog Team.

No short cut to Dmoz Heaven, Patience and determination Pays.

Dmoz submission tips
Dmoz the Mother of all directories is the only directory in the whole internet that is regarded as a favorite by all top Search Engines. For Google sites in Dmoz get a small level of positive score since all the sites accepted in Dmoz are hand picked and reviewed under strict conditions by Dmoz editors.

I have seen this when Dmoz had a official forum named Resource-zone people used to whine and complain that their submission is not accepted. Also people just bash Dmoz in various forums and message boards for giving credit to some sites and accused of Dmoz editors of even taking bribes to get listed sites listed in corresponding category.

From my past experience with getting sites listed in Dmoz I tell you there is no short cut to a listing in Dmoz, also dmoz is the most genuine directory out there and the corruption rate from my analysis and experience is some where around 0.0001%. That is equal to one out of 10,000 Dmoz editors are corrupted. If you see as per the Dmoz blog at any given time there are around 7000 active Dmoz editors. With a corruption rate of 1 out of 10,000 editors I am sure there are no corrupt or cheating editors for dmoz. If you think Dmoz has not accepted your submission stop whining about it and see what are the areas you have problems with your website. Remember today you come to internet and tomorrow you want your site getting listed is not with Dmoz there are more than 1000 cheap zero value directories out there who will list you for nothing ( fact they are good for nothing directories ) . Dmoz likes good quality sites to be listed no Dmoz editor is against listing any site as long it satisfies their quality guidelines.

Dmoz remains to be most comprehensive directory on the web only directory that is anywhere near to it is the Yahoo directory. But still yahoo directory is not the best directory you can come across since its paid. You pay 299$ for review of your site and most of the sites pass through this review but with Dmoz this is not the case you can never buy a listing in Dmoz.

Dmoz is not run for listing all commercial websites. Dmoz is here to provide best value to its visitors. Dmoz editors don’t just rely on site submission for entry into Dmoz. Most of the editors are Web Savvy and they score the Internet in search of quality websites and quality articles to list them in their category. In fact some of my previous discussions with some Meta Editors revealed there will be atleast 30% of articles / news / websites / other stuff listed in Dmoz are through Scanning the internet.

So let me get you a checklist on what it takes to get into Dmoz.

  1. First and the most important point is Dmoz Quality guidelines make sure your site doesn’t violate it. If it does there is no point in even submitting the site to dmoz since it will never be accepted. Some of the guidelines. Duplicate sites – Sites that has mirrors or duplicates and has an exact copy currently running your site cannot get into Dmoz
    Affiliate sites – A site fill with affiliate links and affiliate information will not be accepted into Dmoz. There is no exception for this make sure your site is not a MFA ( Made For Adsense ) or affiliate site.
    Status of Website: Make sure your site is current and that its fully completed an incomplete site when reviewed by a editor will mostly be rejected or will be put in backlog. Its very important that you impress the editor by the first look if the site has pages which says under construction or broken links or incomplete information I am sure your site will never be listed.
    Multi level marketing related websites, Affiliate reseller sites, sites that use cloaking, sites that promote illegal stuff like child pornography, Bootlegs, Warez, pirated software etc are not included in Dmoz
  2. Contact Information: Most of the editors see this as a important factor for site credibility most of the time a site without proper verifiable contact information is rejected. Make sure you have the current contact information and that its legitimately visible on the page. It can either be on a contact page or other pages you prefer make sure it is easily accessible when a editor reviews your site.
  3. Value for users: See how much value your site will provide for users Dmoz as I said is here to provide best value for users they are not here to list your site which has 50,000 products selling actively online. Say the same site has community areas like forums, reviews etc then your site has more chance of getting listed in Dmoz.
  4. Find the right category: This is a very important factor most of time a listing is delayed or rejected if its submitted to the wrong category. Remember Dmoz editors are not paid almost all of them except a top few are all doing work as a volunteer service so make sure you don’t trouble them too much. Take time to research be best category. Say suppose you are a Car Dealer in California. Go to Dmoz and type in Car dealer California http://search.dmoz.org/cgi-bin/search?search=car+dealers+california , you will see a lot of sites showing up which are related to the query you submitted. Now just take time and go through some of the categories which showed up for the Search. This particular search is complicated since the categories that came of are too vague. Here are some of the topics that came up.
    Regional: North America: United States: California: Regions: Northern California: Business and Economy Regional: North America: United States: California: Counties: San Joaquin: Business and Economy
    Business: Automotive: Recreational Vehicles: Retailers: North America: United States: California

    Above I picked some categories which showed up which looked relevant. Now by look we can eliminate California/counties/san Joaquin since is more specific to a city and you are selling whole of California. Next I can also eliminate Northern California since again you are selling whole of California. So mostly the ideal topic is Retailers/North America/United states/ California. So are you convinced that you can submit here. STOP: Don’t do it yet since there might be other categories which never showed up search which can be relevant to your website. So I recommend now going through the regular way, Just click the categories and browse and see if you can find any other category that is relevant to your site if you are convinced that there cannot be any more category relevant to your website then proceed to submission.

  5. Submission Guidelines: Make sure you have a well written title tag and a good descriptive description tag. Always avoid the idea of a optimized title tag. There is nothing like that for Dmoz. Most of the time best title will be your company name itself. For example if your company is California Auto Dealers Inc then that will be the best title for your website. When it comes to writing description make sure you don’t read it like a advertisement, don’t stuff your keywords in it don’t repeat your title, prices etc. Best description will be something which doesn’t fall into any of the categories that is pointed out here in Dmoz submission guidelines. http://www.dmoz.org/guidelines/describing.html
  6. Multiple listings: Sometimes Multiple listings are allowed for example you can submit your site if you eligible for the main category as well as your regional sub category. This is rare research before you do this.
  7. Patience: After you submit your site please wait. Wait for atleast a year before any sort of panic there could be 1000 behind the scene reasons for your site not listed. If your site is not listed find out whether you have anything that violate Dmoz guidelines if you find anything just fix it and keep waiting. Don’t go an resubmit all resubmissions will make your existing submission being pushed to the bottom of the website backlog. If the category you submitted has 100s of websites waiting then your wait time will be longer best option is to be patient and wait.
  8. Finally stop talking bad about Dmoz or bashing them in forums this is one of the most important thing they see. Dmoz editors are some of the most active members of internet and they will find you regardless of which forum or message board you are bashing them. Stop false accusations against Dmoz they are the best out there. Just for the sake of backlinks don’t ever try anything like shoemoney did here (shoemoney.com/2007/08/26/dmoz-extortion/ ) . Your site will never get into Dmoz for atleast the next 100 years.
  9. P.S: See if your site really qualifies to be listed in Dmoz. If your site is listed in Dmoz does it add atleast a small value to dmoz. I will personally see that before submitting to Dmoz if you are new to internet and just build a site take time for the site to grow make it best for your users work atleast 6 months on the site. Once you feel you are good enough submit to Dmoz this increases the chance of acceptance into Dmoz more easily.

    Lets all make Dmoz a better place to be.
    Some Good resources : http://blog.dmoz.org
    http://www.resource-zone.com/forum/

    Search Engine Genie.

Catching a bee in a forest full of bees – Unpredictable Search engines.

Search Engines are becoming more and more unpredictable these days. Search engine optimization and ranking a website is like catching a bee in a forest full of bees. There are lots of factors when it comes to search engine rankings. The more people aggressively target the Search engines more the Search Engineers make the algorithm complicated. See it works like this.

Before say some 6 years search engines had so less spam to fight against. People are not aware of innovative ideas to spam the search engines all they know is FFA ( Free for all links ) , Keyword stuffing , Hidden keywords / links, html content keyword stuffing ( abusing the loop holes in html to stuff keywords ) . automated link exchanges, comment spamming , cloaking / content delivery etc. At that time since the search engine algorithms were not so complicated all these tactics worked. All these are not anywhere near to working in Search engines any more. All these loop holes were closed but still people spam the search engines

Lets see some 3 year back techniques which are against search engine guidelines.

  1. Blog Spamming: Spamming blog with comment spam most targeted sites were big University sites like Stanford.edu where they allowed people to post comment for their articles or news section
  2. Aggressive / Automated Link exchanges: Aggressive Link exchanges were still working but not to the level it used to work around 2002 or 2003. Automated link exchanges became a huge industry and people were using very aggressive link exchanges to gain a upper hand in backlinks
  3. Forum spamming: Spamming forums through signature links , links in their posts, links through automated forum spam etc. This was a very popular tactics where people just visit forums to have their links in signature or in their post to gain search engine benefit.
  4. Cross-linking: A major search engine spam where a spammer starts 100s of sites and cross link them to get link benefit in search engines.
  5. Dmoz clones: Huge number of dmoz clones started arising and became people since it can generate 100s of 1000s of pages instantly for search engines.
  6. Directory spam huge number directories started coming out with zero value to visitors built just for search engine benefit
  7. Links from lots of blogs. This was something which came into existing in 2004 but still in existence to a certain extent.
  8. Links and contents hidden on page, behind images, in noscript tag, in hidden contents etc.
  9. Spamming wikipedia by inserting links ( Nofollow was introduced mostly for wikipedia ) .
  10. Text link ads: Buying your way to the top of search engine organic rankings. Buying text links and gaining search engine benefit due to the search engines dependency on backlinks and anchor text power to rank a website.

If I list it out I can keep on listing lots of things people used for spamming the search engines. But if you today most of the above tactics dont work anymore with search engines. Search engine like Google has closed its algorithm for these sort of loop holes. I can tell you with some sort of background with search algorithms. Just to close the above 8 tactics they need to implement 30 different factors. Its not that easy to detect a spam without hurting millions of websites out there who might have something similar to it but is not considered spam.

To combat these problems search engines role our algorithmic changes very carefully after a lot of testing so that it doesn’t affect any search rankings of innocent webmasters who never did anything against search engine guidelines. At this point I can imagine atleast 150 factors playing into ranking competitive keywords in Google.

Lets see what’s the 2007/2008 search engine spam tactic

Social bookmarking which has picked up so much these days has been a target of search engine spamming for sometime now. People want to bookmark only interesting pages but now everyone bookmarks everything. This is done mostly for search engine benefit. Search engines love sites like delicious or Digg and they tend to crawl links from these sites better. So people tend to target these sites.

Search Engine Spamming in the name of link baiting, Link baiting a very commonly used word these days has been a subject of abuse. People imagining to be creative use some really aggressive methods on their sites to gain natural backlinks. Sometimes it works sometimes it doesn’t . Bad PR ( public relations ) is a main problem here. People tend to write crap about others or about other sites so that they can gain the sympathy of the other party. This is definitely not the healthy way for gaining back links.

Pay per post , pay for blogging – an other tactic which was subject to spamming . Now people are buying posts in blogs. They make good bloggers write about their site and provide a back link to them. Most of the time this is just like a paid text link advertisement.

Adding a page on a established site about your site with anchor text backlinks

Reviews on established blogs are similar to Pay per post where you pay a blogger to review your site and link back in return for Money.

1$ articles – This is a very difficult to combat spam where advertisers are paying 1$ for articles which are of very low quality and stuff their site with junk information to show search engines that they have contents.

This is just a small list of new ideas to spam search engines. I can go on to list more so we cannot blame the search engines in anyway for doing something like this.

One wonderful thing is the death of Anchor Text link advertising in 2007/2008. I am personally Happy about it. It makes the rich and famous dominate Search Results. Its not anymore the case. Search Engines dont see text link advertising as a search engine friendly one and are ready to impose strong penalties for sites that buy or sell links. Though Search engines like Google like to fight search engine spamming algorithmically I am sure some manual review especially on paid links or renting links will bring more success and quality to the index.

Considering all these new tactics coming up I am sure in future Search Engine Optimization is going to be one of the hardest industry to work on. Saying that if there are no innovating ideas to get a site ranked I am sure search engine quality engineers will go Jobless. So let us keep them working harder to fight search engine spamming algorithmically.

Search Engine Genie.

Are you in wikipedia’s spam blacklist,

Do you have your site in wikipedia’s blacklist then I recommend get it removed immediately. Personally we don’t know any of our sites or client sites in wikipedia blacklist.
meta.wikimedia.org/wiki/Spam_blacklist

Wikipedia has a strong relationship with Search engines, Wikipedia does share this information to Search engines and it will result in your site loosing credibility with search engines.

Mattcutts of Google denies any automated penalty


If you do a search for [wikipedia spam blacklist], the first result is helpful. It gives pointers to various strings and urls that Wikipedia has blacklisted on their site.
I’d characterize that list as much like a spam report: the data can be useful, but at least in Google it wouldn’t automatically result in a penalty (for the reason that site A might be trying to hurt site B).
That could be one of the things jehochman was referring to.”

http://www.searchenginejournal.com/wikipedia-spam-resulting-in-google-yahoo-penalties/5854/

Even if matt denies he does say its kind of like a spam report so I recommend staying out of the list always,

Search Engines love webmasters and siteowners

I am seriously surprised how the trend has changed from completely ignoring what people shout about their sites to a full customer care by all the search engines. I belong to one of the old member group of webmasterworld and before about 3 years search engines rarely care about a webmaster or site owner’s cries. People used to shout, cry and express their pain in forums like webmasterworld and i help you but no search engine employee gave their ears on looking into their problems.

But now its a total different world I would say. Looking at the prompt responses people get from search quality engineers about their site problems it makes me seriously amazed.

For Google we have mattcutts blog, webmaster central blog, adwords rep in webmasterworld forum , John Muller participating in forums, Mattcutts goes around in webmaster / SEO blogs and responds to any specific problems or concerns ( ofcourse his blog is great ) , we have Google Groups where many employees like John Muller, Jonathan simon, Adam Lasnik, Susan, Mariya etc hangout, Most of the google experts are always around when a webmaster is serious about their problems,

Back in 2003 we just had one ghost like representative from Google named GoogleGuy in webmasterworld who comes around and asks for feedback. If there is some update to their algorithm or search index occasionally he will respond with a lot of ambiguity. His answers are like answers from the Google god itself and his posts are followed so much. People just come to webmaster world to search for his posts. Till today 95% or more webmasterworld active members don’t know who the real person with the nickname Googleguy is. Many suspect Googleguy is Mattcutts but he denied to be the original Googleguy. But today things have changed we now have Google employees answering questions all over the place. They recently hosted a webmaster live chat session first of its kind which was a success attended by more than 250 people.

For yahoo it used to be Tim Mayer nick named Yahoo Tim he just comes around and gives a weather update ( Yahoo’s index update ) or answers some rare questions but now they have a wonderful place for webmasters to seek help.

http://suggestions.yahoo.com/?prop=SiteExplorer

This place gives much better support than what Google does. I rarely see a question that is not reviewed by a yahoo site explorer employee. That is a very positive sign and a huge step forward in bringing webmasters/White hat SEOs and site owners closer. Its great that Yahoo a search engine which used to be very reluctant to help webmasters now have a place where they provide instant solution for Site owner’s problems.

MSN it used to be MSN dude he started from the day MSN separated from Yahoo to become an independent Search Engine. He comes for some feedback but never used to be regular but now even the big Microsoft has a forum for webmasters to talk about problem with their sites. When I visited this forum I can see people like Brett Hunt a Live search employee actively answering questions of site owners. This is great.

So seeing the world change so much I can just say one thing finally search engines have understood the importance of Good relationship and communication with webmasters and site owners to maintain the quality of the index. A Good communication and setting up strong guidelines help webmasters go in the right direction when it comes to ranking their sites in Search Engines.

I like to thank wholeheartedly all top Search engine Engineers for taking this bold step of helping us webmasters.

Search Engine Genie

Request a Free SEO Quote