One of the most asked question in forums and in message boards what is the maximum depth a search engine can crawl for a page. What is the maximum size Of html /pdf that can indexed by top search engines.
1. For google: As per our latest research Google has a maximum crawl and cache depth of 1 MB only ( excluding images / graphics ) . It Used to be 100 kb then they increased to 250 kb then to 500kb and the latest update is 1 MB per file.
2. Yahoo overtakes Google by a long way, Their indexing and caching limit is 5 MB
3. MSN Search engine: Its very unpredictable for MSN but from our experiment MSN can cache upto 3 MB, we never tested about that probably someone in their search quality team can answer that.
I dont think we worry about any more search engines. I am sure at some point this data is useful for anyone out there. I know we do have some PDFs and large doc to be indexed. Its very important we know the cache limit for that.