How Google handles Scrapers – Nice information from Google team
Its been a long battle between Google, webmasters and content thieves who scrap information from a website and display it on their website to get traffic. Many Webmasters had been complaining for a long time about this problem. As far as i know Google is already doing a good job with content thieves and scraper sites. Now they have opened up with their inner workings on how they tackle this problem.
We tackle 2 types of dupe content problems one within a site and other with external sites. Dupe content within a site can easily be fixed. I am sure we have full control over it. We can find all potential areas which might create 2 pages of same content and prevent one version from crawling or remove any links to those pages which might be duplicates.
External sites are always a problem since we don’t have any control over it. Google says they are now effectively tracking down potential duplicates and give maximum credit to the Original source and filter out rest of the duplicates.
If you find a site which is ranking above you using your content Google says
- Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.
- You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.
- Check if your site is in line with our webmaster guidelines.
For more information read this official Blog posting
.info domains banned in Google – Webmaster reports .info domains getting penalized
Some active members of webmaster world are discussing about a potential penalty to .info domain names. It seems for about 2 weeks Google has been experimenting with .info domain names by removing from search results for a certain period of time to check how much spam it stops in their results.
“This has happened to my .info domain since last night and I am really upset at it.
All my 300 keywords have stopped working in Google but my site still appearing in Google with site:www.example.info and www.example.info searches.
Yesterday I received 600 visitors from Google and today only 4.
My .info domain is one and half years old. Till, Today I regularly update it with unique content and don’t promote it much as I am already receiving number of visitors.. I never did spamming or adopted prohibited ways to promote the site.
I do Free directory submissions and very seldom links exchanges.
Can any expert tell me please what why it has happened to my site?
Is it a permanent problem or temporarily.?
I will be thankful for any help and guidance “
Did Google really ban .info domains i doubt it its very difficult for Google to ever attempt something like that. If you check the search for Global registry you can see afilias.info ranking in top 3 this is an indication that .info domain itself is not banned. May be because that domain extension is abused too much Google might have removed some domains.
Google has more than 200,000 Servers
According to a interview with Google fellow Dean its estimated Google has more than 200,000 servers in various of its datacenter around the Globe.
“Google doesn’t reveal exactly how many servers it has, but I’d estimate it’s easily in the hundreds of thousands. It puts 40 servers in each rack, Dean said, and by one reckoning, Google has 36 data centers across the globe. With 150 racks per data center, that would mean Google has more than 200,000 servers, and I’d guess it’s far beyond that and growing every day. “
Is TPR Penalty back – Google’s Toolbar Pagerank Reduction Seem to be visible again
It seems Google’s TPR penalty is back, Google around February this year brought a new type of penalty for sites that buy / Sell links. Its called the TPR penalty. In this type of penalty Google will reduce the pagerank of sites that are suspected of Buying or Selling Backlinks. It seems the penalty is back and more sites are affected.
A member has reported Pagerank reduction to his website forums.digitalpoint.com/showthread.php?t=862890 here.
If you see TPR affecting your site please post here. We are doing some research on this and would love to hear feedback from others. We want to see whether TPR penalty affects only sites that Buy and Sell links or it affects innocent sites caught up inbetween. We have seen some sites loose TB PR without any link selling or buying that is why we need to ASK.
SEG
Viacom’s threat might kill internet freedom – Youtube lawsuit
Viacom sued youtube and its owner Google for allowing copyrighted videos to be posted on their website. Viacom claims there are more than 150,000 of copyrighted videos posted in youtube and they claim a billion dollar as damages for illegal viewing of their copyrighted videos.
“Viacom claimed YouTube consistently allowed unauthorised copies of popular television programming and movies to be posted on its website and viewed tens of thousands of times.
It said it had identified more than 150,000 such abuses which included clips from shows such as South Park, SpongeBob SquarePants and MTV Unplugged.
The company says the infringement also included the documentary An Inconvenient Truth which had been viewed “an astounding 1.5 billion times”. “
According to the document:
threatens the way hundreds of millions of people legitimately exchange information
I agree with above Google’s claim. The whole Internet is built upon sharing information with each other just because someone illegal posted a copyrighted video on a popular video hosting site it doesn’t mean video site is responsible for it. If this is the case then there wont be any Forums or blogs or message boards out there. Then every site that runs a message board will have to verify each and every comment posted since some of them might be copyrighted.
Dow Jones: “When we filed this lawsuit, we not only served our own interests, we served the interests of everyone who owns copyrights they want protected.”
Well i am not sure what is the interest of other copyright owners i feel Dow jones comments should be for Viacom only. Google’s policy has always been don’t be evil they grew up with that motto. Today they are way ahead of all other Search Engines because users are their No.1 priority. Even though Google’s Search engine indexes millions of copyrighted pages and stores in their database still no-one complains since they are done what is best for Internet users.
Video sharing websites are a great way to share legitimate information across Internet. Just because someone posted some copyrighted videos its not fair to blame Youtube totally.
Well, I can give a number of reasons why Viacom is completely wrong with their lawsuit.
1. If Youtube is suppose to verify all millions of videos posted in their site they cannot run a site at all. There are so many areas where legitimate information is shared like forums, blog comments, news sites etc. If every single information needs to be verified then there won’t be any good information sites on the web.
2. The word “copyright” itself will kill the way Internet works. Today anything and everything is copyrighted and its impossible to share legitimate information unless they are verified. I feel Copyright kills Internet freedom and there should be New laws for Internet information sharing which will help freedom of sharing.
3. After they faced lawsuit last year Google added automated copyright detection tool which stops some copyright videos from being uploaded. This is a very legitimate move and the best Youtube can do to prevent copyright videos. It also shows Google or Youtube don’t want copyrighted videos on their sites.
4. Youtube has a very clear copyright policy on videos hosted by them. If you read the following
Commercial Content Is Copyrighted
The most common reason we take down videos for copyright infringement is that
they are direct copies of copyrighted content and the owners of the copyrighted
content have alerted us that their content is being used without their
permission. Once we become aware of an unauthorized use, we will remove the
video promptly. That is the law.
Some examples of copyrighted content
are:
TV shows
Including sitcoms, sports broadcasts, news broadcasts,
comedy shows, cartoons, dramas, etc.
Includes network and cable TV,
pay-per-view and on-demand TV
Music videos, such as the ones you might find
on music video channels
Videos of live concerts, even if you captured the
video yourself
Even if you took the video yourself, the performer controls
the right to use his/her image in a video, the songwriter owns the rights to the
song being performed, and sometimes the venue prohibits filming without
permission, so this video is likely to infringe somebody Else’s rights.
Movies and movie trailers
Commercials
Slide shows that include
photos or images owned by somebody else
A Few Guiding Principles
It
doesn’t matter how long or short the clip is, or exactly how it got to YouTube.
If you taped it off cable, videotaped your TV screen, or downloaded it from some
other website, it is still copyrighted, and requires the copyright owner’s
permission to distribute.
It doesn’t matter whether or not you give credit
to the owner/author/songwriter—it is still copyrighted.
It doesn’t matter
that you are not selling the video for money—it is still copyrighted.
It
doesn’t matter whether or not the video contains a copyright notice—it is still
copyrighted.
It doesn’t matter whether other similar videos appear on our
site—it is still copyrighted.
It doesn’t matter if you created a video made
of short clips of copyrighted content—even though you edited it together, the
content is still copyrighted.
You can see they are very clear about copyright and even provide links for copyright complains.
5. As of March there are over 3 Billion videos hosted in Youtube. Viacom claim 150,000 copyrighted videos which is only a fraction of the overall videos posted in Youtube. How can you expect youtube to check through this rubble and find copyrighted videos when its actually the user who is suppose to care about this.
6. Most of the time quality of videos posted in Youtube are very bad and they are also very short videos. I feel its not a big deal for small videos of low quality posted in youtube for viewing will kill media Business like Viacom altogether.
7. Some people suggest effective verification system like asking for Credit Card details to post videos. Well i know so many people who are not willing to post their CC information for legitimate online buying how can you expect them to give their CC details just for uploading videos this is a bit too much and anything like this will stop legitimate users from enjoying the freedom of Internet.
Viacom should withdraw their lawsuit to avoid humiliation in court, i feel there wont be anything legitimate against Youtube or Google which will prove Viacom’s claim. If Viacom ever win this lawsuit which looks highly unlikely the way the whole Internet works will change and everyone will start suing each other which will be a disaster for Internet.
Lets all stand behind Youtube and Google and wish them success with their defence. Youtube should come out clear from this for the welfare of Internet.
Search Engine Genie
Joomla hacking problem reported – a warning message
A poster in webworkshop forum posted this
“I have uncovered some malicious activity on my website which seems to be based around a Joomla/server vulnerability. I am still analysing the extent of the problem but here is what I have found so far. After performing a backlink check on my website I noticed a lot of links coming into the website with an anchor text of “F”. Many of these websites seem to be genuine businesses (whether they actually are or not is still being debated), however the link itself was hidden in a mass of hidden links only visible by disabling CSS. When I say a mass of links I am taking 100s. After further investigation I found the cause of the problem, a script file called phpgw.php. Somehow the server has been hacked and the file called phpgw.php had been placed in a folder called “images/stories”. From what I can see this script pulls in the template file for the website and modifies the code to contain the spam links. The story continues….I pulled up the access logs for the website and there was only one reference to the phpgw file from the IP address 212.62.97.20, a Saudi Arabian company who seem to be known for content spamming and malicious linking, see the following URL: http://www.projecthoneypot.org/i_b387d0cd6f471d4ce6e0535228689b7d Whether this is a server issue or a Joomla issue is still un clarified (I assume it’s a bit of both) but I warn Joomla users to disable CSS, check for spammy links, and check the server for the phpgw.php file. I’m still looking into the situation so I’ll update you all if I find out anything else. “
This looks like an issue that needs immediate attention since link injection is not only bad for your site but very bad for SEO. If Google crawls your site and find links to spammy websites it will ban your site temporarily or in rare cases permanently out of their index. We had a client face the same problem where his site was hacked and he got the following email from Google
“
Dear site owner or webmaster of ***********,
While we were
indexing your webpages, we detected that some of your pages were using
techniques that are outside our quality guidelines, which can be found here:
http://www.google.com/webmasters/guidelines.html. This appears to be because
your site has been modified by a third party. Typically, the offending party
gains access to an insecure directory that has open permissions. Many times,
they will upload files or modify existing ones, which then show up as spam in
our index.
The following is some example hidden text we found at
*****************
*
*
In order to preserve the quality of our search engine, we have
temporarily removed some of your webpages from our search results.
The mail from Google was actually longer which is cut short here. Matt cutts webspam head also posted an entry in his blog on how to help hacked sites http://www.mattcutts.com/blog/helping-hacked-sites/
You can see from matt’s post that Google is not happy with hacked website with malicious and spam links. I warn everyone who use vulnerable content management systems like Drupal, WordPress, Joomla etc to patch all possible vulnerabilities.
If you are using wordpress i recommend downloading the latest version http://wordpress.org/download/ and installing on your server
For Drupal too latest version works.
For joomla if you find installing the latest version all over a bit difficult i recommend just patching all the loopholes using their security extensions here http://extensions.joomla.org/index.php?option=com_mtree&task=listcats&cat_id=1802&Itemid=35
Have a safe site Google and every search engines love sites that are user friendly and safe for browsing.
Google Advisor site diagnostics tool – Google’s free tool
Google now offers diagnostics report for a site for example
For our Site Searchenginegenie.com Google says the following
www.google.com/safebrowsing/diagnostic?site=http://searchenginegenie.com
Safe Browsing
Diagnostic page for searchenginegenie.com/
What is the current listing status for searchenginegenie.com/?
This site is not listed as suspicious.
What happened when Google visited this site?
Of the 2 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 05/10/2008, and suspicious content was never found on this site within the past 90 days.
Has this site acted as an intermediary resulting in further distribution of malware?
Over the past 90 days, searchenginegenie.com/ did not appear to function as an intermediary for the infection of any sites.
Has this site hosted malware?
No, this site has not hosted malicious software over the past 90 days.
Well though the report sounds great for our site i am not sure how much i can believe Google on this.
I did the same search for serials.ws a popular crack download site. This site has f**ked up my system couple of times just for downloading some useless cracks. I am not sure how Google gives the following certification.
Safe Browsing
Diagnostic page for serials.ws/
What is the current listing status for serials.ws/?
This site is not listed as suspicious.
What happened when Google visited this site?
Of the 1310 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 05/26/2008, and suspicious content was never found on this site within the past 90 days.
Has this site acted as an intermediary resulting in further distribution of malware?
Over the past 90 days, serials.ws/ did not appear to function as an intermediary for the infection of any sites.
Has this site hosted malware?
No, this site has not hosted malicious software over the past 90 days.
Well no malicious stuff on this site according to Google? that’s very weird since this site is very popular for malicious downloads. Don’t have a hint on what criteria Google judges the quality of a site.
Is this a manual review – webmasterworld member points Google employee visit
A webmasterworld member sees a visit from Mountain view Google’s IP checking site: search for his sub domain. If you don’t know what a site: search is its used to check the pages indexed by Google for that particular domain.
For example site:www.searchenginegenie.com will reveal all pages indexed for searchenginegenie.com , this is the regular method used in Google.
Kidder says : “I just noticed a Mountain View IP running a fairly specific site: query on one of our subdomains. I checked the IP and it came up as Google. Does this sound like a manual review? “
Tedster Google forum Administrator replies a Yes. Since he is a veteran in this industry we can believe his words. I feel manual visits from Google is just normal especially for a webmaster site like ours since we do write articles on Google and use tools that are using Google to deliver results.
Yahoo’s Petabyte index – Bigger is not always better
Yahoo breaches the petabyte mark. Yahoo’s VP for data Mr.Wagan announced that yahoo has crossed the petabyte mark of the Search Index. Google in 2005 announced their data is 3 times more than Yahoo’s but after that they went into silence and started discussing on search quality.
Now Yahoo has bragged again that they now have the capacity to store 10s of petabytes of data and is ready for the coming future.
For anyone who don’t know what a Petabyte is it 1024 Terabytes and 1 terabyte is equal to 1024 gigabytes.
To tell in simple terms imagine you have an image on your website that is 50 KB then yahoo has data equivalent of 20,000,000,000 images that you have on your site. Or the current data cluster of Yahoo data equivalent of 20 billion 50 KB images.
To read more on Yahoo’s Bragging go here . Though yahoo claim to have a huge index i am sure Google will have much bigger index than Yahoo. I feel due to the competition between these two rivals soon we might expect an announcement from Google about their index capabilities.
Google opens up more with their secrets
Google’s Vice president in Engineering for Search Quality Mr .Udi Manber opens up more with the inner working on Search Quality with Google. He vaguely explains the inner working on Google and the various methods they use to rank a website in Google. Some of the highlighted points of his posts are the way he describes about factors that are used to return a document relevant to the query.
1. Pagerank: PageRank is a link analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
Pagerank as per Udi Manber is still in use though not the way it used to be primary factor but now just part of a bigger algorithm.
2. Ability to Handle phrases: As per Mr.Manber Google’s ability to handle phrases play an important role when it comes to ranking a webpage. The better the algorithm understands the phrase better the results will be.
3. synonyms, diacritics, spelling mistakes – Google sees synonyms spelling errors and other factors to return relevant results. ( Diacritics – A mark, such as the cedilla of façade or the acute accent of resumé, added to a letter to indicate a special phonetic value or distinguish words that are otherwise graphically identical. )
4. query models (it’s not just the language, it’s how people use it today)
5. time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time) – I have seen this more and more with Google in recent days for some keywords Google wants to rank new documents immediately and over few days the document drops out of Google’s index only to return later in a position Google decides best for its users.
6. personalized models (not all people want the same thing).
“we made significant changes to the PageRank algorithm in January”
I feel this is where the testing is not done throughly because for lot of high quality sites inner pages are showing grey bar in Google toolbar pagerank display. Its very ambiguous what Google thinks about these pages are they not worth for Google anymore? Are they penalized in any way? Do they needs to be fixed in some way?
Google has left us guessing with this particular change IMHO.
Thanks anyway for Mr.Udi to give an excellent writeup we are very happy to see this,
http://googleblog.blogspot.com/2008/05/introduction-to-google-search-quality.html
Blogroll
Categories
- AI Search & SEO
- author rank
- Authority Trust
- Bing search engine
- blogger
- CDN & Caching.
- Content Strategy
- Core Web Vitals
- Experience SEO
- Fake popularity
- gbp-optimization
- Google Adsense
- Google Business Profile Optimization
- google fault
- google impact
- google Investigation
- google knowledge
- Google panda
- Google penguin
- Google Plus
- Google Search Console
- Google Search Updates
- Google webmaster tools
- google-business-profile
- google-maps-ranking
- Hummingbird algorithm
- infographics
- link building
- Local SEO
- local-seo
- Mattcutts Video Transcript
- Microsoft
- Mobile Performance Optimization
- Mobile SEO
- MSN Live Search
- Negative SEO
- On-Page SEO
- Page Speed Optimization
- pagerank
- Paid links
- Panda and penguin timeline
- Panda Update
- Panda Update #22
- Panda Update 25
- Panda update releases 2012
- Penguin Update
- Performance Optimization
- Sandbox Tool
- search engines
- SEO
- SEO Audits
- SEO Audits & Monitoring
- SEO cartoons comics
- seo predictions
- SEO Recovery & Fixes
- SEO Reporting & Analytics
- seo techniques
- SEO Tips & Strategies
- SEO tools
- SEO Trends 2013
- seo updates
- Server Optimization
- Small Business Marketing
- social bookmarking
- Social Media
- SOPA Act
- Spam
- Technical SEO
- Uncategorized
- User Experience (UX)
- Webmaster News
- website
- Website Security
- Website Speed Optimization
- Yahoo




