The Anatomy of a Search Engine

Introduction
1. Web Search Engines Scaling Up: 1994-2000
2. Google Scaling with the Web
3. Design Goals
  1. Improved Search Quality
  2. Academic Search Engine research
System Features
1. PageRank, Bringing Order to the Web
  1. Description of PageRank Calculation
  2. Instuitive Justification
2. Anchor Text
3. Other Features
  1. Location info for all hits
  2. Keep track of visual details such as font size
  3. Store full raw HTML
Related Work
1. Info Retieval
2. Differences Between the Web and Well Controlled Collections
System Anatomy
1. Architecture Overview
2. Major Data Structure
  1. Big Files
  2. Repository
  3. Document Index
  4. Lexicon
  5. Hit Lists
  6. Forward Index
  7. Inverted Index
3. Crawing the Web
4. Indexing the Web
5. Searching
  1. The Ranking System
  2. Feed back
Results and Performance
1. Storage Requirements
2. System Performance
3. Search Performance
Conclusions
1. Future Work
  1. Query caching, smart disk allocation, subindices
  2. what old should be recrawled, what new should be crawld
  3. using proxy caches to build search databases
  4. commercial search engines like boolean operators, negation, and stemming
  5. relevance feedback and clustering
  6. user context, Result summarization
  7. extend the use of link structure and link text
2. High Quality Search
3. Scalable Architecture
4. A Research Tool