Uni-Processor
Statically Scheduled Pipelines
Classic 5-stage Pipeline
Data Hazards
Control Hazards
Structural Hazards
Precise Exception
out of order instruction completion
Superpipelined & Superscalar
Branch Prediction
Static instruction scheduling
local
global
Dynamically Scheduled Pipelines
enforcing data dependencies: Tomasulo algorithm
Speculative execution: Execution beyond unresolved branches
Adding Speculation to Tomasulo alogrithm
Dynamic memory disambiguation
Explicit register renaming
Checking pointing
Register fetch after instruction issue
speculative instruction scheduling
memory disambiguation
beating the data-flow limit: value prediction
multiple instructions per clock
deal with complex ISAs
VLIW Micro-achitecture
Duality of Dynamic and Static Techniques
VLIW Architecuture
Loop Unrolling
Software pipelining
Non-cyclic VLIW Scheduling
Predicated Execution
Speculative memory disambiguation
Exception
EPIC Micro-architecture
Vector Micro-architecture
Multi-Processor
Memory Hierarchies
THe pyramid of memory levels
Memory access locality
Memory hierarchy coherence
Memory inclusion
Cache hierarchy
Cache mapping and organization
Replacement policies
Write policies
Cache hierarchy performance
Classification of cache miss
non-blocking (look-up free) caches
Cache prefetching and preloading
Virtual Memory
Motivation for virtual memory
Operating Systems' View of Virtual Memory
Virtual Address Translation
Memory Access Control
Hierarchical Page Tables
Inverted page table
Translation Lookaside Buffer
VIrtual-address caches with physical tags
Virtual-address caches with virtual tags
Coherence and Memory Consistency
Background
Shared-memory communication model
Hardware components
Coherence and Memory Access Atomicity
why is coherence in multiprocessors so hard
Cache Protocols
Snooping protcols
Directory protocols (cc-NUMA)
Memory access atomicity
Plain Coherence
Sequential Consistency
Formal model for sequential consistency
Access ordering rules for sequential consistency
Memory access buffering
Synchronization
Basic synchronization primitives
Hardware-based synchronization
Software-based synchronization
Relaxed Memory Consistency Models
Not relying on synchronization
Relaying on synchronization
Speculative violations of memory orders
Conservative memory model enforcement in OoO processors
Speculative violations of memory orders
introduction
what is computer architecture
Components of parallel architecture
processors
memory
interconnects
parallelism in architecture
Instruction-level parallelism (ILP)
Thread-Level Parallelism (TLP)
Vector and array processors
Performance
Benchmarking
Reporting performance for a set of programs
Reporting speedups
Amdahl's law
Parallel speedup
Technological Challenges
Power and energy
Reliabiligy
Wire Delays
Design Complexity
Limits of miniaturization and the CMOS end-point