Google knows the web is big – a informative post in Google blog,
Google is one of the biggest website. We’ve known it for a long time that the web is big. The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, they’ve seen a lot of big numbers about how much content is really out there. Recently, even their search engineers stopped in awe about just how big the web is these days when their systems that process links on the web to find new content hit a milestone1 trillion-unique URLs on the web at once! So how many unique pages does the web really contain?? No one knows how many it contains but the number of pages out there is infinite! We don’t index every one of those trillion pages, many of them are similar to each other, or represent auto-generated content. But Google is proud to have the most comprehensive index of any search engine, and there goal is always been to index the entire world’s data. To keep up with this volume of information, their systems have come a long way since the first set of web data Google processed to answer queries. Then they did everything in batches- one workstation could compute the Pagerank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, they do the computational equivalent of fully exploring every intersection of every road in the United States. Google’s distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question- your next Google search.
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
No comments yet.
Leave a comment
Blogroll
Categories
- 2013 seo trends
- author rank
- Bing search engine
- blogger
- Fake popularity
- google ads
- Google Adsense
- google fault
- google impact
- google Investigation
- google knowledge
- Google panda
- Google penguin
- Google Plus
- Google webmaster tools
- Hummingbird algorithm
- infographics
- link building
- Mattcutts Video Transcript
- Microsoft
- MSN Live Search
- Negative SEO
- pagerank
- Paid links
- Panda and penguin timeline
- Panda Update
- Panda Update #22
- Panda Update 25
- Panda update releases 2012
- Penguin Update
- Sandbox Tool
- search engines
- SEO
- SEO cartoons comics
- seo predictions
- seo techniques
- SEO tools
- seo updates
- social bookmarking
- Social Media
- SOPA Act
- Spam
- Uncategorized
- Webmaster News
- website
- Yahoo