Google uses Search Logs effectively to combar Webspam

Fighting web spam using effective tracking of logs and click through data.

Matt Cutts Senior Software Engineer and Lead of Web Spam Team in Google recently made an interesting post on how Google effectively fights spam using Data collection.



Web Spam is the most annoying part of Internet today. Especially more than 85% people uses search engines to land on any site for the first time Search Engine Spam should be totally avoided when it comes to user search experience. Search engines have always had the taunting task of fighting web spam from the day they came into existence. Google is one of the search engines which used effective anti-web spam methods to combat search engine spam. This is one reason they are keeping their position on top of all search engines.

First time i have seen Google really acknowledge that they are using log data in their algorithm to combat spam. According to Official Google blog

"Data from search logs is one tool we use to fight web spam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of our search quality (such as index size and coverage, results "freshness," and spam). Whenever we create a new metric, it's essential to be able to go over our logs data and compute new spam metrics using previous queries or results. We use our search logs to go "back in time" and see how well Google did on queries from months before. When we create a metric that measures a new type of spam more accurately, we not only start tracking our spam success going forward, but we also use logs data to see how we were doing on that type of spam in previous months and years.



The IP and cookie information is important for helping us apply this method only to searches that are from legitimate users as opposed to those that were generated by bots and other false searches. For example, if a bot sends the same queries to Google over and over again, those queries should really be discarded before we measure how much spam our users see. All of this--log data, IP addresses, and cookie information--makes your search results cleaner and more relevant."

As per Matt cutts IP address and search logs do play a role in judging the quality of results delivered to users. I personally feel this is a good option i know there are some IPs that spam the search engines more than the regular IPs if Google is able to monitor the IPs pretty well they can effectively block automated queries and this can be used in search algorithm. Some keywords will always be spammed more than others, Google as they say can use these type of tracking to impose more filters to those kind of phrases. As a person with more than 5 years of experience with search engines i can see Google imposes stronger filters for certain phrases than others. For keywords like Cancer, mesothelioma more gov, org authority sites rank which for keywords like auto transport, real estate commercial sites do a better job. Really enjoy the way the results are displayed since i don't like to see a commercial site when i search for medicine related information. Most of the time commercial sites provide much lesser value to users than non-commercial sites. There are areas where we need to see more commercial sites and there are areas that needs more information sites to be dominant. Only way to get this right is to check through the hysterical data and search logs and see what keywords are searched more from where, what the user did after clicking the data etc. User tracking can be done effectively using strong filters and effective methods.

Search Engines are facing problems every day. Apart from Web spam and search engine spam they see DDOS attacks, excessive bot activity, scrappers etc. To stand on top they need to keep working on stronger methods to combat spam.

Labels: , ,


0 Comments:

Post a Comment

Links to this post:

Create a Link

<< SEO Blog Home