This section is for that ol fav from Stanford... PageRank
Extracting knowledge from the World Wide Webhot! 02/07/2009 Hits: 736
Google - The sheer size of the web has led to a situation where even simplestatistics about it are unknown, for example, its size or thepercentage of pages in a certain language. The ability to sampleweb pages or web servers uniformly at random is very useful fordetermining statistics. For example, we can use random URLsto estimate the distribution of the length of web pages, thefraction of documents in various Internet domains, or thefraction of documents written in various languages. We can alsodetermine the fraction of web pages indexed by various searchengines by testing the engines for the presence of pages chosenuniformly at random.
The Anatomy of a Large-Scale Hypertextual Web Search Enginehot! 02/07/2009 Hits: 906
Sergey Brin and Lawrence Page - To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date.