-
Introduction
- Web Search Engines Scaling Up: 1994-2000
- Google Scaling with the Web
-
Design Goals
- Improved Search Quality
- Academic Search Engine research
-
System Features
-
PageRank, Bringing Order to the Web
- Description of PageRank Calculation
- Instuitive Justification
- Anchor Text
-
Other Features
- Location info for all hits
- Keep track of visual details such as font size
- Store full raw HTML
-
Related Work
- Info Retieval
- Differences Between the Web and Well Controlled Collections
-
System Anatomy
- Architecture Overview
-
Major Data Structure
- Big Files
- Repository
- Document Index
- Lexicon
- Hit Lists
- Forward Index
- Inverted Index
- Crawing the Web
- Indexing the Web
-
Searching
- The Ranking System
- Feed back
-
Results and Performance
- Storage Requirements
- System Performance
- Search Performance
-
Conclusions
-
Future Work
- Query caching, smart disk allocation, subindices
- what old should be recrawled, what new should be crawld
- using proxy caches to build search databases
- commercial search engines like boolean operators, negation, and stemming
- relevance feedback and clustering
- user context, Result summarization
- extend the use of link structure and link text
- High Quality Search
- Scalable Architecture
- A Research Tool