seo problems seo blog
        SEO        Link Building        Services        Genie Magic        Web Design        Contact Us        SEO Tools
 
   
An other excellent post by brett tabke of webmasterworld.com on duplicate content issues with search engines,, Friday, March 11, 2005
   
 

A great post in webmasterworld by brett tabke explains how search engines treat duplicate content, It is worth a read by everyone,

What is dupe content?
a) Strip duplicate headers, menus, footers (eg:
the template)
This is quite easy to do mathematically. You just look for
string patters that match on more that a few pages.
b) Content is what is
left after the template is removed.
Comparing content is done the same way
with pattern matching. The core is the same type of routines that make up
compression algos like Lempel-Ziv (lz).
This type of pattern matching is
sometimes referred to as a sliding dictionary lookup. You build an index of a
page (dictionary) based on (most probably) words. You then start with the lowest
denominator and try to match it against other words in other pages.
How
close is duplicate content?
A few years ago, an intern (*not* Pugh) who
helped work on the dupe content routines (2000?), wrote a paper (now removed).
The figure 12% was used. Even after studying, we are left to ask how that 12% is
arrived at.
Cause for concern with some sites?
Absolutely. People that
should worry: a) repetitive content for language purposes. b) those that do auto
generated content with slightly different pages (such as weather sites, news
sites, travel sites). c) geo targeted pages on different domains. d) multiple
top level domains.
Can I get around it with random text within my template?
Debatable. I have heard some say that if a site of any size (more than
20pages) does not have a detectable template, that you are subject to another
quasi penalty.
When is dupe content checked?
I feel it is checked as a
background routine. It is a routine that could easily run 24x7 and hundreds of
machines if they wanted to crank it up that high. I am almost certain there is a
granularity setting to it where they can dialup or dial down how close they
check for dupe content. When you think about it, this is not a routine that
would actually have to be run all the time because one they flag a page as a
dupe, that would take care of it for a few months until they came back to check
again. So I agree with those that say it isn't a set pattern.
Additionally,
we also agree that G's indexing isn't as static as it used to be. We are into
the "update all the time" era where the days of GG pressing the button are done
because it is pressed all the time. The tweaks are on-the-fly now - it's pot
luck.
What does Google do if it detects duplicate content?
Penalizes the
second one found (with caveats). (As with almost ever Google penalty, there are
exceptions we will get to in a minute).
What generally happens is the first
page found is considered to be the original prime page. The second page will get
buried deep in the results.
The exception (as always) - we believe - is high
Page Rank. It is generally believe by some that mid-PR7 is considered the "white
list" where penalties are dropped on a page - quite possibly - an entire site.
This is why it is confusing to SEO's when someone says they absolutely know the
truth about a penalty or algo nuance. The PR7/Whitelist exception takes the
arguments and washes them.
Who is best at detecting dupe content? Inktomi
used to be the undisputed king, but since G changed their routines (late
2003/Florida?), G has detected the tiny page to the large duplicate page without
fail.
On the other, I think we have all seen some classic dupe content that
has slipped by the filters with no explaination apparent.
For example, these
two pages:
The original: http://www.webmasterworld.com/forum3/2010.htm
The
duplicate: http://www.searchengineworld.com/misc/guide.htm
The
10,000 unauthorized rips: (10k is best count, but probably higher): Successful
Site in 12 Months with Google Alone
All-in-all, I think the dupe content
issue is far over rated and easy to avoid with quality original content. If
anything, it is a good way to watch a competitor penalized.

                                Earthlink Netscape Netvouz RawSugar Shadows Sphinn StumbleUpon Yahoo MyWeb

 
   
 

posted by power @ 12:43 PM permanent link   | Post a Comment |

 

2 Comments:
  • At 1:05 PM, Blogger NY Party Shuttle said…

    Interesting subject. Our site has several good quality incoming links and has good content on the home page, a blog, and internal pages. However, we are not in the top 1000 Google listings for our main keywords. On Yahoo, by comparison, we are in the top 100 listings for each major keyword. I'm trying to figure out why we are being penalized in Google. At first, I thought it was the sandbox, but this has been going on way to long (we launched the site early last summer. Here's my question:

    We have two sites:
    www.atlanticcitypartyshuttle.com
    and
    www.newyorkpartyshuttle.com

    The one I'm worried about is NYPS. You mention duplicate pages, which there are none between the two sites, but there are several common links back and forth. Also, ACPS is the older site (by about 3 months last year). Any ideas on why we're not showing up higher in Google?

     
  • At 7:39 PM, Blogger Ricjie said…

    My site http://www.oasisoflove.com is being promoted via articles that I write. The articles are distributed around the net on article directories.

    Since I have the same articles on my site, is it best not to have them on the site?

     

Post a Comment

Home

 

Categories
 
Archives
 
Previous posts
 
 
PageRank 10 sites
Search Engine Optimization SEO Blog
Search Engine
Optimization SEO News
SEO Copywriting Blog
Web Design Blog
Link Building Blog
Pay Per Click (PPC) Blog
Programming Blog
Search EngineGenie Blog
Lara's Personnal Blog
Search Engine
Optimization SEO Forum
SEO Comics
Webmaster & Search Events
 
search engine optimization
search engine marketing
SEO consulting
SEO plans
SEO services USA
search engine optimization SEO forum
SEO comics
Webmaster & Search Events
SEO Faqs
link popularity
strategies of link building
link building services
link cost
link request quote
link building blog
search engine genie company
our team
our celebrations
our experience
SEO
why us

web design
web designing services
dynamic website
web design and marketing
simple e-commerce website
complex e-commerce website
search engine friendly site
web design blog
web design
link building
internet marketing
ecommerce implementation
pay per click services
shopping feeds optimization
shopping cart customization
product development
online forms & database integration
programming services
PHP programming services
programming services Java,J2EE
.NET application development programming services
business process outsourcing
offshore outsourcing
Google Products(froogle feeds)
search engine optimization articles
google articles
yahoo articles
miscellenous articles
search engine optimization SEO blog
search engine optimization SEO news
SEO copywriting blog
web design blog
link building blog
pay per click (PPC) blog
programming blog
lara personnal blog
Search Engine Genie Blog
Google Tools
Yahoo Tools
MSN Tools
Comparison Tools
Link Popularity Tools
Search Engines Tools
Site Tools
Keyword tools

contact us
support
our guarantee
events