Mattcutts discusses PR sculpting:



Matt Cutts, talks about the best ways to stop Google from crawling your content, and how to remove content from the Google index once we've crawled it.

Sebastine explains pretty well on that topic:

As for password protected contents, are you sure that you don't index those based on 3rd party signals like ODP listings or strong inbound links?



You totally forgot to mention the neat X-Robots-Tag that allows outputting REP tags like "noindex" even for non-HTML resources like PDFs or videos in the HTTP header. That's an invention Google can be very proud of. :)


@Ian M
Actually, Google experiments with Noindex: in robots.txt, but that's "improvable".


@Google

Currently Google interprets Noindex: in robots.txt as (Disallow: + Noindex:). I think that's completely wrong, because:

1. It's not compliant to the Robots Exclusion Standard.

2. It confuses Webmasters because "noindex" in robots.txt means something completely different than "noindex" in meta tags or HTTP headers.

3. Mixing crawler directives and indexer directives this way is a plain weak point that will produce misunderstandings resulting in traffic losses for Webmasters and less compelling contents available to searchers. All indexer directives (noindex,nofollow,noarchive,noodp, unavailable_after etc.) do require crawling when put elsewhere. I do Webmaster support for ages and I assure you that Webmasters will not get it. If nobody understands it and adapts it, it's as useless as Yahoo's robots-nocontent class name that only 500 sites on the whole Web make use of.



4. The REP's "noindex" tag has an implicit "follow" that Google ignores in robots.txt for technical reasons (it's impossible to follow links from uncrawled pages). When I put a robots meta tag with a "noindex" value, then Google rightly follows my links, passes PageRank and anchor text to those, and just doesn't list the URL on the SERPs. When I do the same in robots.txt Google behaves totally different, for no apparent reason. (Of course there's a reason but I want to keep this statement simple.)

Having said all that, I appreciate it very much that Google works on robots.txt evolvements. Kudos to Google! However, please don't assign semantics of crawler directives to established indexer directives, that doesn't work out. I see the PageRank problem, and I think I know a better procedure to solve that. If you're interested, please read my "RFC" linked above. ;)

@all

Do not make use of experimental robots.txt directives unless you really know what you do, and that includes monitoring Google's experiment very closely. If you've the programming skills, then better make use of X-Robots-Tags to steer indexing respectively deindexing of your resources on site level. X-Robots-Tags work with HTML contents as well as with all other content types.

Labels:


0 Comments:

Post a Comment

Links to this post:

Create a Link

<< SEO Blog Home