Maximum page limits Search Engine crawlers can crawl

One of the most asked question in forums and in message boards what is the maximum depth a search engine can crawl for a page. What is the maximum size Of html /pdf that can indexed by top search engines.



1. For google: As per our latest research Google has a maximum crawl and cache depth of 1 MB only ( excluding images / graphics ) . It Used to be 100 kb then they increased to 250 kb then to 500kb and the latest update is 1 MB per file.

2. Yahoo overtakes Google by a long way, Their indexing and caching limit is 5 MB, check the screen shots below,







3. MSN Search engine: Its very unpredictable for MSN but from our experiment MSN can cache upto 3 MB, we never tested about that probably someone in their search quality team can answer that.

I dont think we worry about any more search engines. I am sure at some point this data is useful for anyone out there. I know we do have some PDFs and large doc to be indexed. Its very important we know the cache limit for that,

SEO Blog Team,

2 Comments:

Anonymous Anonymous said...

Sounds like incorrect, You can find in Google document having more than 10MB

Here is a 6 MB document
https://www.google.com/search?q=http%3A%2F%2Fcontent.hearthnhome.com%2Fdownloads%2FinstallManuals%2Fman_mtvernonAE.pdf

10:24 AM  
Blogger power said...

There is a different between cached and indexed, I can see google has indexed the page you are talking about but its not cached,

But yahoo has cached the document since without caching they cannot give the exact size of the document if you see google search for PDFs large files are not cached please do a search in GOogle you will understand what I am saying.

12:43 PM  

Post a Comment

Links to this post:

Create a Link

<< SEO Blog Home