Archive for June, 2008

Google uses Search Logs effectively to combar Webspam

Fighting web spam using effective tracking of logs and click through data.

Matt Cutts Senior Software Engineer and Lead of Web Spam Team in Google recently made an interesting post on how Google effectively fights spam using Data collection.

Web Spam is the most annoying part of Internet today. Especially more than 85% people uses search engines to land on any site for the first time Search Engine Spam should be totally avoided when it comes to user search experience. Search engines have always had the taunting task of fighting web spam from the day they came into existence. Google is one of the search engines which used effective anti-web spam methods to combat search engine spam. This is one reason they are keeping their position on top of all search engines.

First time i have seen Google really acknowledge that they are using log data in their algorithm to combat spam. According to Official Google blog

“Data from search logs is one tool we use to fight web spam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of our search quality (such as index size and coverage, results “freshness,” and spam). Whenever we create a new metric, it’s essential to be able to go over our logs data and compute new spam metrics using previous queries or results. We use our search logs to go “back in time” and see how well Google did on queries from months before. When we create a metric that measures a new type of spam more accurately, we not only start tracking our spam success going forward, but we also use logs data to see how we were doing on that type of spam in previous months and years.

The IP and cookie information is important for helping us apply this method only to searches that are from legitimate users as opposed to those that were generated by bots and other false searches. For example, if a bot sends the same queries to Google over and over again, those queries should really be discarded before we measure how much spam our users see. All of this–log data, IP addresses, and cookie information–makes your search results cleaner and more relevant.”

As per Matt cutts IP address and search logs do play a role in judging the quality of results delivered to users. I personally feel this is a good option i know there are some IPs that spam the search engines more than the regular IPs if Google is able to monitor the IPs pretty well they can effectively block automated queries and this can be used in search algorithm. Some keywords will always be spammed more than others, Google as they say can use these type of tracking to impose more filters to those kind of phrases. As a person with more than 5 years of experience with search engines i can see Google imposes stronger filters for certain phrases than others. For keywords like Cancer, mesothelioma more gov, org authority sites rank which for keywords like auto transport, real estate commercial sites do a better job. Really enjoy the way the results are displayed since i don’t like to see a commercial site when i search for medicine related information. Most of the time commercial sites provide much lesser value to users than non-commercial sites. There are areas where we need to see more commercial sites and there are areas that needs more information sites to be dominant. Only way to get this right is to check through the hysterical data and search logs and see what keywords are searched more from where, what the user did after clicking the data etc. User tracking can be done effectively using strong filters and effective methods.

Search Engines are facing problems every day. Apart from Web spam and search engine spam they see DDOS attacks, excessive bot activity, scrappers etc. To stand on top they need to keep working on stronger methods to combat spam.

Official Google Blog post here

Use Custom Search And Adsense For Search Live better – Google Says

Adsense For Search

Here in Google we are passionate about Search we are passionate about users and passionate about what they are looking for. obviously Google.com. As a search user you love the Google’s powerful search platform but what about putting a custom Google search engine on your site and now our custom search offers more custom options than never before not only have you control over the look and feel of your search result you now have full control of what’s being searched. Also with relevant search results Google will deliver relevant search results on what you are looking for where you can earn revenue as well as have a positive search experience on your own site. To tell you about some of our new features, i would like you to introduce to matt one of our search engineers who has been actively working on Adsense for Search.

Matt: With the new custom search engine you can customize Adsense for Search used just for your site. A collection of sites you specify say all travel blogs or the entire web we are also working on improving the color of site search so that you can
display a better format which will be good looking for your site.

That’s great matt tell us about getting relevant Ads and results on your site. Ofcourse with the new keyword feature you can continue search engine to your sites topic you have an other website say that is looking for Yoga class. So when you search matt in search box instead of getting results for Yoga maps you will also get results for commercial Mats and forum mats. I am not sure you have the yellow color for the floor mat.

So you define your search engine for keyword such as Yoga, Exercise and Meditation. So next time when you search for results for mat you will get results related to Yoga. That’s perfect

Ofcourse users will get quality and relevancy of Google search results. More customization means more control to you and better experience for your users. And its quick and easy to setup just click on the setup tab and under adsense account click Adsense for Search and if you don’t have an account today sign-up under www.google.com/adsense or www.Google.com/afs

Vijay

Google Leads the table in helping out webmasters

Google has picked up its efforts to help out webmasters for their website related problems. They now guide webmasters to effectively maintain a website so that it gets its deserved rankings. Google gives support in Google groups and other various forums, by employee blogs ( Matt cutt’s blog ) , Official Google blogs ( especially webmaster central blog ) , Google live webmaster chat ( 2 already over ) and now we have Google Trifecta: Webmaster Tools, Analytics, Website Optimizer coming up in July. We are very excited about this and thought we can create some cartoons representing Google’s good work.

1. Matt Cutts , Adam Lasnik and other Googlers are Desperate in impressing webmasters with their active participation in various forums and Blogs.

Google Matt cutts Adam lasnik chat

2. Matt Cutts Speaks in various conferences to make webmaster understand the way Google works.

Google Matt Cutts Search Marketing Expo Conference speech

3. Google is becoming increasingly popular among webmasters and site owners for their effort to become closer to them. Two successful Google Webmaster Live chat is a proven strategy.

Google Webmaster Live chat important

4. Matt Cutts Answering Q and A in Webmaster Conference

Matt Cutts Answering Questions and Answers

5. Google has been actively promoting Google Webmaster Central BLog them want webmasters and site owners to read it. Also their Webmaster Tools are becoming increasingly popular.

Webmaster Central Blog Google Webmaster Tools

6. Calling all webmasters to participate and visit Google Webmaster help Groups.

GOogle webmaster help groups calling webmasters

Vijay
Search Engine Genie.

Two new site tools launched – More good tools to our inventory

We have one of the best list of Useful SEO and website tools in Internet. All of them are free for use without any limitations and we have strict policy to never run ads in our tools. Please see our full list of tools here http://www.searchenginegenie.com/seo-tools.htm

We recently launched 2 new tools to our ever increasing number of tools.

1. First is an outbound link checker this tool helps to track links going out of your website. Currently we show only 10 links but in future we will increase and will more number of links that goes out of your website. Please check this tool here http://www.searchenginegenie.com/tools/chkOutboundLinks.html

2. Next is the Pinging tool which will Ping your site and will provide the IP address of your website within seconds. Though this tool is pretty common most of the people who use Internet never know what is the IP where their website is hosted. It can be a shared IP or a dedicated Ip but its important to know where your site is hosted. You can use our new tool to instantly check the Ip address of any website which includes your site or your competitor sites. Enjoy the tools and please link to it only way you can support our tools is by providing back links. That is all we ask for all the free service we provide. We love the world of free stuff and we will continue to provide lot more free tools in future. We have 5 more tools in development feel free to visit us regularly to enjoy those tools.
http://www.searchenginegenie.com/tools/whatismyip.html

Remember you don’t need any premium login or Gold membership to use our tools. We don’t believe in stuff like that.

Search Engine Genie Team.

Chat with Semantic Web Expert Ben Adida by Yahoo,

RDFa launched!

Recently Yahoo chat had an interview with a web expert Ben adida. As Yahoo has announced its intentions to support semantic markups. Yahoo has continued to work with the best semantic markup community. Ben is a one among the faculty in Harvard Medical School and at the children’s Hospital Informatics Program well as a research fellow with the Center for Research on Computation and Society with the Harvard School of Engineering and Applied Sciences. He is also the Creative Commons representative to the W3C and chair of the RDF-in-HTML task force, focusing on bridging the semantic and clickable webs.

Ben was questioned as to RDFa has been long in the process of making and the reply was that the delay was for a good because they wanted enough flexibility in the data management which would be useful for current as well as future use.

Y!: What can I do with RDFa?
BA: You can tell the world what various components on your web page mean by marking up things like:

* The title of a photo
* Your name and contact information
* The license under which you’re distributing your latest MP3
* The ingredients of a cooking recipe
* The price of an item
* A gene on which you recently wrote a paper
* … Anything that you want to make more machine-readable

With RDFa, you can reuse existing concepts, e.g. the title and price of an item, no matter what that item is. If there’s a field you need that doesn’t exist, you can create it.

This level of granularity encourages you to mark up your content as fully as possible, while letting applications consume only as much of the data as it needs.
Micro formats, eRDF and AB meta and RDFa all serve the same goal.
The advantage which RDFa provides compared to microformats, eRDF and AB meta are that while Micro formats do possess field conflicts RDFa doesn’t have field conflicts the titles can reused. As concerned with eRDF has much lesser data content than RDFa.
To the critics Ben says that it is a matter of finding the right compromise and he considers that RDF and eRDF have the same level of complexities as far as authors are concerned. It is more difficult to write RDFa than microformats but that is because microformats are limited in scope and microformats are quite costly to use. For few months They are looking forward to assist the publisher’s to produce RDFa tool and the tool builders to parse it correctly.


More information here

MSN Live Search, Google Search and Yahoo Join Hands for a good reason

Google Yahoo MSN now have a common protocol for sitemaps and they also have same rules for understanding robots.txt file
As per live search blog,

“We at Live Search are pleased to announce another collaboration with Yahoo and Google aimed at making webmasters’ lives easier. Webmasters have long used the Robots Exclusion Protocol (REP) to control how search engines access and display their content. The REP offers an easy and efficient way to communicate with search engines, and is currently used by millions of publishers worldwide.
Over the past few years, we have been working with Yahoo and Google to agree on common ways for webmasters to communicate with search engines. Our previous efforts include support for the Sitemaps protocol .While most search engines already comply with the REP, this is the first time the three major search engines have come together to detail how we actually implement the protocol. This effort makes it easier for webmasters to know how REP directives will be handled by search providers.
You can view the details of how we implement the REP at Documentation for the Robots Exclusion Protocol.

Selling links not wrong when you are Matt Cutt’s Friend.

Barry of Seroundtable.com has been selling links on his site for a long time. If you go to www.seroundtable.com you can see a bunch of text links without Nofollow. Even he has the TPR penalty where his Toolbar Pagerank is reduced from 6 to 4 which is an indication that her has lost trust with Google. But it seems Matt Cutts doesn’t care about all these stuff since Barry is his friend.

Barry is a long term friend to Matt Cutts and now in recent Matt’s Post http://www.mattcutts.com/blog/google-webmaster-chat-tons-of-fun/ he gave a bunch of links to Barry for posting the 2nd webmaster live chat transcript. He does this because Barry is his friend and if anyone else posts the same script he just ignores them.

When the first Google webmaster live chat happened we were the first to post the script here . Even Matt Cutts knew we posted but he just ignored them as if it never existed. I can understand now how important its to attend Search Conferences and make Matt your friend.

You can have 2 advantages.

1. Sell as much text links as you want and still Matt cutts will ignore you and will be happy to be your friend.
2. Will give bunch of links from his blog since he likes you.

Lets follow the new rule of Text link advertising be friend with Google and you will be ignored even if you sell 1000 links.

Vijay,

Maile Ohye of Google defines IP delivery, geolocation, and cloaking

Hi My name is Maile Ohye and I am a support Engineer with Google webmaster central. Today I want to discuss a fairly advanced topic of IP delivery well talk about the background and some considerations if you choose to use IP delivery on your site.

Today’s topic will consist of 4 areas. First we will discuss the background on IP delivery and this is why a webmaster choose to implement technique such as IP delivery. Next Ill talk about how Google.com serves our users. And you can see some of the ways we server users based on their IP addresses. And after you see some of the techniques at Google.com ill show you some ones not to use. These are examples that are post in the case of major websites that uses IP delivery but on sub-optimum methods. These are things to shy away from and last our recap on design considerations. So when you want to design your site for IP delivery first you would have a question why would a webmaster have to choose IP delivery.

One major reason is IP delivery helps target information to users. So lets say you have a .com URL and your Business is all in English and you are doing very well in United States and if you want to broaden your market place and perhaps server users in Europe then you realize a potential customer in Germany will have different needs than a American user. For example they might have different languages as well as different regional concerns such as what’s your shipping tax when you ship product to my country. That’s where IP delivery comes into play, IP delivery is the process of delivering specific contents to users based on their IP address. So if you can detect a users IP address when a request comes in and understand what region they are coming from then you will be able to target specific content such as Ads that are more pertinent to their region so this might be say you have a user coming from California based on their IP address you might say we have low shipping cost to California or if they are detected to be from Germany you might say oh you don’t have taxes for handling for users in Germany. So lets make this more concrete by seeing what Google bot does. So that this scenario where you have a user in Switzerland by detecting their Switzerland based IP address and your browser is set to german which is the language of the region if you then visit www.google.com rather than being shown the content of www.google.com you are actually likely will be redirected to www.google.ch and this is Switzerland’s top level domain. And here you see the content and it will be in German as well, so in this instance google not only utilizes the IP address but also the language settings. Now a slight variation on this scenario lets say your users are in Switzerland and you are just vacationing actually in America which is just there. So you still have the Switzerland IP address but you might have your browser set as English settings. So now lets say this user visits www.google.com instead of being redirected as a formal user their URL might usually remain www.google.com .
And they see similar contents as most of us see in United states but this page will be updated since they have Switzerland IP address and it will have a link saying go to Google Switzerland. So here is an area where Google uses an IP address to serve this user better information. So now you saw some things what Google uses. What are some of the How not to’s. Well these are some of the mishaps we see on the web and one idea is a website might choose to bought in the market place and by translating all of their existing content but serving all these modified contents on their same URL but haven’t modified their site structure. But this is going to be problematic because URLs has to be probably unique should be largely same content for URL. There will some issues that will arise when these things happen. For example users cannot share their URLs among different people if they are not from the same IP range so I see a great product on your URL and I am in America and I want to share the URL to my friend in Japan you might want to do something totally different or update the complete thing in Japanese but that might not the way we want to share it we might want to see the same content and an other side-effect of using the same URLs for different content such as different languages is that you need to remember search engine crawlers can come from all over the world.
We can have a number of IP addresses so lets say your .com site serves 90% of your users in English and you tried to reach 5% that are in German so you rather have search results show English Contents. But lets say a search engine crawls you from German IP and you give them all german contents for those URLs its very possible that a search engine can overwrite your English data with that German content which can result in titles and description in different language than you desire. An other how not to is to serve Googlebot specifically different contents than you do for users. This is called cloaking and it’s a violation of our webmaster guidelines so remember if you are implementing IP delivery then you want to server Googlebot the same content you serve to users with a similar IP address. So how you have some things that Google uses and also some of the things How not to use lets just consider some design considerations. AS we discussed earlier keep each URL consistent serve largely same content on each URL, this means that if you have dynamic portions that you contain them or limit them to small areas.

So for Google we have link that says go to Google Switzerland. An other thing that you can use is the same product that everyone use like regional coupons which says low shipping cost to Germany. In tandem with that idea you can create separate URLs with more varied content so if you translated your contents to different languages remember you also want to create sub-domains, sub-directories or even obtain a top-level domain for that information and if you choose to do that say you have German content now you might put that on example.com/de or on a Top Level Domain example.de. And if you use sub-domains or sub-directories remember that if they are verified you can use webmaster tools for Geo-location and there you can take example.com/de and target that to the location of Germany. And last keep in mind all your users ip address and utilizing IP delivery doesn’t solve all your problems. You need to understand users and their browser settings because you might have a English user who might be on a vacation in Germany. So keep in mind that you can use the except language header that comes with the request. To give your users the most optimize results. So thanks so much for watching this section on IP delivery. For more webmaster information please checkout webmaster central at google.com/webmasters

Maile Ohye

How to prevent Google Bowling – Interesting discussion in WMW

I came across an interesting discussion in webmasterworld about how to prevent damage to a site by links from other sites. You can read more about the discussion here. http://www.webmasterworld.com/google/3677877.htm

Tedster a long term member and webmaster world Administrator answers pretty well. I am very impressed with his answer and agree 100%.

Even if you block IP addresses or redirect pages, the links that point to your
website are still there, and they’re on other sites that you can’t control.
Those links will have whatever effect they have with Google’s also.
There is
one thing that protects a website against Google Bowling – a solid backlink
profile of its own. The more your “real” quality backlinks grow, the less anyone
else’s malicious actions can affect it.

Regardless of spam backlinks or now as tedster says a good backlink profile will automatically remove any bad PR received from bad links.

Pagerank craze whitebar, Greybar of Greenbar

Picture says it all. People like to have more green in their Google Toolbar Pagerank bar

Request a Free SEO Quote