Computers execute machine code, sequences of bytes encoding the low-level operations that manipulate data, manage memory, read and write data on storage devices, and communicate over networks.
GNU toolchain
http://en.wikipedia.org/wiki/GNU_toolchain
【Better Explained】
Gnu as Document
http://sourceware.org/binutils/docs-2.16/as/
GNU make
GNU Compiler Collection (GCC)
The C preprocessor (cpp)
gcc -E file.i
The C compiler (cc1)
gcc -S file.s
The GNU Assembler(GAS,as)
gcc -c file.o
GNU linker (GNU ld)
gcc -o file
GNU Binutils
GNU Bison
GNU m4
GNU Debugger (GDB)
objdump maybe the best partner
GNU build system (autotools)
Autoconf
Autoheader
Automake
Libtool
AT&T assembly
【Better Explained】
操作数不能同时为 Memory reference
Accessing Information
x86 registers
%eax和%ah是整体和部分的关系
Operand Specifiers
immediate
In ATT-format assembly code, these are written with a ‘$’ followed by an integer using standard C notation, for example, $-577 or $0x1F.
register
Register, denotes the contents of one of the registers, either one of the eight 32-bit registers (e.g., %eax) for a double-word operation, one of the eight 16-bit registers (e.g., %ax) for a word operation, or one of the eight single-byte register elements (e.g., %al) for a byte operation.
memory reference
Operand forms
Data Movement Instructions
Data movement instructions
With sign expansion, the upper bits of the destination are filled in with copies of the most significant bit of the source value.
Arithmetic and Logical Operations
Integer arithmetic operations
Load Effective Address
leal 的目的操作数必须是register
Unary Operations
Binary Operations
Shift Operations
Special Arithmetic Operations
Control
Condition Codes
Accessing the Condition Codes
SET指令
CMP指令
TEST指令
Jump Instructions
Translate C language in assembly language
Conditional Branches
Loops
Conditional Move Instructions
Switch Statements
Array Allocation and Access
Basic Principles
Pointer Arithmetic
Nested Arrays
Fixed-Size Arrays
Variable-Size Arrays
Heterogeneous Data Structures
Structures
Unions
Data Alignment
Procedures
Stack Frame Structure
Transferring Control
Register Usage Conventions
Recursive Procedures
Out-of-Bounds Memory References and Buffer Overflow
x86-64
Processor Architecture
List of CPU architectures
http://en.wikipedia.org/wiki/List_of_CPU_architectures
Embedded CPU architectures
ARM's ARM Architecture
Intel's 8051 architecture
Zilog's Z80 architecture
Western Design Center's 65816 architecture
Hitachi's SuperH architecture
Axis Communications' ETRAX CRIS architecture
Power Architecture (formerly PowerPC)
Microcomputer CPU architectures
x86
IA-32
x86-64
AMD64
Intel 64
Advanced RISC Machines' (originally Acorn) ARM
Motorola's 6800 and 68000 architectures
Power Architecture (formerly IBM POWER and PowerPC)
Workstation/Server CPU architectures
Intel's Itanium architecture (formerly IA-64)
Sun Microsystems's SPARC architecture
Power Architecture (formerly IBM POWER and PowerPC)
DEC's Alpha architecture
Mini/Mainframe CPU architectures
IBM's System/360, System/370, ESA/390 and z/Architecture (1964-present)
DEC's PDP-11 architecture, and its successor, the VAX architecture
Mixed-core CPU architectures
IBM's Cell architecture
List of Instruction sets
http://en.wikipedia.org/wiki/List_of_instruction_sets
Intel
AMD
ARM
Donald Knuth
IBM
Motorola
ISA
http://en.wikipedia.org/wiki/Instruction_set
Cpu Design
http://en.wikipedia.org/wiki/CPU_design
【Better Explained】
CPU design focuses on these areas:
datapaths (such as ALUs and pipelines)
control unit: logic which controls the datapaths
Memory components such as register files, caches
Clock circuitry such as clock drivers, PLLs, clock distribution networks
Pad transceiver circuitry
Logic gate cell library which is used to implement the logic
A CPU design project generally has these major tasks:
Programmer-visible instruction set architecture, which can be implemented by a variety of microarchitectures
Architectural study and performance modeling in ANSI C/C++ or SystemC
High-level synthesis (HLS) or RTL (e.g. logic) implementation
RTL Verification
Circuit design of speed critical components (caches, registers, ALUs)
Logic synthesis or logic-gate-level design
Timing analysis to confirm that all logic and circuits will run at the specified operating frequency
Physical design including floorplanning, place and route of logic gates
Checking that RTL, gate-level, transistor-level and physical-level representations are equivalent
Checks for signal integrity, chip manufacturability
Y86
Y86 Instruction set Architecture
【Better Explained】
Y86 Programs
Some Y86 Instruction Details
push %esp
pop %esp
Processor's State
Program Registers
Condition Codes
Program Counter (PC)
Memory
Program States
AOK
HLT
ADR
INS
Instruction set
Four movl instructions
Four integer operation instructions
The seven jump instructions
Six conditional move instructions
The call instruction pushes the return address on the stack and jumps to the destination address.
The ret instruction returns from such a call.
The pushl and popl instructions
The halt instruction
Instruction Encoding
【Better Explained】
Initial byte
the higher 4 bits , code part
the lower 4 bits ,function part
register specifier byte
4-byte constant word
Exceptions
Sequential Y86 Implementations
Organizing Processing into Stages
Fetch
取指
Decode
译码
Execute
执行
Memory
访存
Write back
写回
PC update
SEQ Hardware Structure
Fetch
Hardware Unit
Instruction memory
PC incrementer
Control logic blocks
Program stat
Decode
Hardware Unit
Register file
A
B
Control logic blocks
srcA
srcB
Program stat
Execute
Hardware Unit
ALU
CC
Control logic blocks
ALU B
ALU fun
ALU A
Program stat
Memory
Hardware Unit
Virtual Memory System
Control logic blocks
Mem. control
Addr
Data
Program stat
Write back
Hardware Unit
Register file
E
M
Control logic blocks
dstM
dstE
Program stat
PC update
SEQ Timing
【Better Explained】
The processor never needs to read back the state updated by an instruction in order to complete the processing of this instruction.
Some rule in Y86
PC is loaded with a new instruction address every clock cycle
CC is loaded only when an integer operation instruction is executed.
Virtual memory is written only when an rmmovl pushl or call instruction is executed.
Hardware unit that require Clock Sequence control
Clocked registers
PC
CC
Random-access memories
Virtual memory system
The register file
SEQ stage implementations
Fetch Stage
seq_fetch.png
Decode & Write back
seq_decode&writeback.png
Execute Stage
seq_execute.png
Memory Stage
seq_memory.png
PC update Stage
seq_PC.png
Pipelined Y86 Implementations
SEQ+: Rearranging the Computation Stages
Circuit Retiming
Inserting Pipeline Registers
Rearranging and Relabeling Signals
Next PC Prediction
Branch Prediction
Pipeline Hazards
Data Hazard
Program registers
Control Hazard
PC
Mispredicted branches
Ret instructions require special handing
Avoiding Data Hazards by Stalling
Avoiding Data Hazards by Forwarding
Data forwarding
Bypassing
Load/Use Data Hazards
One class of data hazards cannot be handled purely by forwarding,because memory reads occur late in the pipeline. 言外之意,就是内存中的值 valM 不像 valE 从一开始出现就是被钉死了的。本质原因还是出来的晚,CSAPP言简意赅。
Load inerlock
Exception Handling
Inner Exception
Halt
illegal Instruction that combine code and fun
Fetch,read or write a illegal address
Excepting instruction
Exception Handler
This is the one for your OS
Some details
The higher priority,the deeper of a instruction,that will be reported to your OS
A excepting instruction occurs after a mispredicted branch
The different stage in pipeline will be updated by different instruction. For instance ,if there is some error in one stage of pipeline made by a malicious instruction will effect other instruction that should not be executed.
Some mechanism
Take stat and other information to Program stat and W_register
PIPE Stage Implementations
PC Selection and Fetch Stage
Decode and Write-Back Stage
Execute Stage
Memory Stage
Pipeline Control Logic
Special logic control cases
Processing ret
The pipeline must stall until the ret instruction reaches the write-back stage.
Load/use hazard
The pipeline must stall for one cycle between an instruction that reads a value from memory and an instruction that uses this value.
Mispredicted branches
By the time the branch logic detects that a jump should not have been taken,several instructions at the branch target will have started down the pipeline. These instructions must be removed from the pipeline.
Instruction squashing
Exception
【Better Explained】
Bad stuff
There are two stage will make exception:Fetch and Memory
There are three stage will update the new program stat:Execute Memory and Write-back
If an instruction leads an exception ,we forbid the later instruction updating the programmer-visible stat. In addition , when the excepting instruction reaches the write-back stage,we stop the application.
Detecting Special Control Conditions
Pipeline Control Mechanisms
Combinations of Control Conditions
Control logic implementation
Logic Design
Tools
Hardware Description Language HDL
Verilog
VHDL
Hardware Control Language HCL
【Better Explained】
The different between HCL's Combinational Circuits and C's logic expression
组合电路 随输入 实时 变化 ,C 语言的 逻辑表达式只在程序执行到时求值。
组合电路 操作数只有0和1 ,C 中0表FALSE,something else 表示TREUE
C语言逻辑操作有短路属性,组合电路无
Signal Declarations
boolsig name ’C-expr’
intsig name ’C-expr’
Quoted Text
quote ’string’
Expressions
HCL Boolean expressions
0
Logic value 0
1
Logic value 1
name
Named Boolean signal
int -expr in {int -expr 1,int -expr 2, . . . ,int -expr k}
Set membership test
int -expr 1 == int -expr 2
Equality test
int -expr 1 != int -expr 2
Not equal test
int -expr 1 < int -expr 2
Less than test
int -expr 1 <= int -expr 2
Less than or equal test
int -expr 1 > int -expr 2
Greater than test
int -expr 1 >= int -expr 2
Greater than or equal test
!bool -expr
NOT
bool -expr 1 && bool -expr 2
AND
bool -expr 1 || bool -expr 2
OR
HCL Integer expressions
Numbers
Named integer signals
Case expressions
[bool -expr 1 : int -expr 1;bool -expr 2 : int -expr 2;...;];
Blocks
bool name = bool -expr;
int name = int -expr;
Logic synthesis
General Principles of Pipelining
Pipeline
Limitations of Pipelining
Nonuniform Partitioning
Diminishing Returns of Deep Pipelining
Pipelining a System with Feedback
【Better Explained】
Introducing pipelining into a system containing feedback paths is a peril
We must deal with feedback effects properly
Data dependency
Control dependency
Hardware Units
Circuit
Logic Gates
Combinational Circuits
Bit equal
MUX,multiplexor
Word-Level Combinational Circuits
Arithmetic logic unit ALU
Sequential circuit
Memory device
【Better Explained】
Read and write a register at the same time
我们会看到一个从旧值到新值的变化
Clocked registers
PC
CC
Random-access memories
Virtual memory system
The register file
The instruction memory
Clocking
Optimizing Program Performance
http://en.wikipedia.org/wiki/Program_optimization
Common theme
Trade-offs
Bottlenecks
When to optimize
Macros
Time taken for optimization
Platform dependent and independent optimizations
【Better Explained】
Writing Efficient Programs
Bentley's rules
Algorithms and Data structures
Write source code that the compiler can effectively optimize.
Parallel Computing
Understanding Processors
【Better Explained】
Boundary
Latency bound
Throughput bound
the reciprocal of issue time
Overall Operation
Nehalem
Superscalar
Multi operation per clock-cycle
Out-of-order
ICU Instruction Control Unit
Instruction Cache
Fetch Control
Branch prediction
Speculative execution
Instruction Decode
FIFO
Retired
Flushed
Retirement unit
Register file
EU Execution Unit
Functional units
【Better Explained】
Register renaming
Load
Store
FP add + integer
FP mul/div +integer
Branch +integer
Data cache
Functional Unit Performance
【Better Explained】
Latency
deep in pipeline
Issue time
Fully pipeline
An Abstract Model of Processor Operation
Transformation from machine code to data-flow
Other performance stuffs
Understanding Memory
【Better Explained】
Write/read dependency
the different
Load Store
Store Load
Load unit
Store unit
Store buffer
Life in the Real World: Performance Improvement Techniques
High-level design. Choose appropriate algorithms and data structures for the problem at hand. Be especially
vigilant to avoid algorithms or coding techniques that yield asymptotically poor performance.
Basic coding principles. Avoid optimization blockers so that a compiler can generate efficient code.
Eliminate excessive function calls. Move computations out of loops when possible. Consider selective compromises of program modularity to gain greater efficiency.
Eliminate unnecessary memory references. Introduce temporary variables to hold intermediate results. Store a result in an array or global variable only when the final value has been computed.
Low-level optimizations.
Try various forms of pointer versus array code.
Reduce loop overhead by unrolling loops.
Find ways to make use of the pipelined functional units by techniques such as iteration splitting.
A final word of advice to the reader is to be careful to avoid expending effort on misleading results.
VRAM output is produced by shifting the entire
contents of the internal buffer in sequence
VRAM allows concurrent reads and writes to the
memory
Nonvolatile Memory
PROM
EPROM
EEPROM
Flash
ROM
Firmware
Disk Storage
Disk drive
platter
Two surfaces
magnetic recording material
spindle
revolutions per minute (RPM)
Track
Sector
Gap
formatting bits
Cylinder
Disk Capacity
Recoding density
Track density
Areal density
Multiple zone recording
Recoding zone
Disk Operation
Read/write head
At any point in time, all heads are positioned on the same cylinder
Disks read and write data in sector-sized blocks
Actuator arm
radial axis
Seek
Head crash
Access time
Seek time
Rotational latency
Transfer time
Logical Disk Blocks
disk controller
a small buffer on the controller
Firmware
logical block number
(surface, track, sector) triple
logical blocks
Accessing Disks
Memory-mapped I/O
I/O port
Direct Memory Access,DMA
Solid State Disks
USB
SATA
Flash chip
Flash translation layer
Block
Page
Locality
【Better Explained】
they tend to reference data items that
are near other recently referenced data items, or that were recently referenced themselves
principle of locality
temporal locality
a memory location that is referenced once is likely to be referenced again multiple times in the near future
spatial locality
if a memory location is referenced once, then the program is likely to reference a nearby memory location in the near future
Locality of References to Program Data
Stride-1 reference pattern
Sequential reference pattern
stride-k reference pattern
Locality of Instruction Fetches
Summary of Locality
Programs that repeatedly reference the same variables enjoy good temporal locality
the smaller the stride the better the spatial locality
The smaller the loop body and the greater the number of loop iterations, the better the locality
The Memory Hierarchy
Caching
【Better Explained】
chunk
block
Transfer unit
Cache Hits
Cache Misses
Replacing
Evicting
Victim block
Replacement policy
LRU
Kinds of Cache Misses
Cold cache
Warmed up
Compulsory miss/Cold miss
Conflict miss
Working set
Capacity miss
Cache Management
compiler
Register
MMU
TLB
hardware logic built into the caches
L1
L2
L3
OS & address translation hardware on the CPU
DRAM
Cache Memories
Generic Cache Memory Organization
General organization of cache (S; E; B;m)
cache
cache set
cache line
block
m address bits t tag bits, s set index bits, b block offset bits
Direct-Mapped Caches
Set Selection
s
Line Matching
valid bit
tag bit
block offset bit
Word Selection
Line Replacement on Misses
Conflict Misses
Thrash
Set Associative Caches
E-way set associative cache
Set Selection
Line Matching
Word Selection
Line Replacement on Misses
LRU
LFU
Fully Associative Caches
Set Selection
Only one
Line Matching
Word Selection
Issues with Writes
Write hit
Write through
Write back
Dirty bit
Write miss
Write allocate
Not write allocate
Core i7 Cache Hierarchy
Core 0
L1 i -cache
L1 d -cache
L2 Unified cache
L3 Unified cache
Performance Impact of Cache Parameters
Metrics
Miss rate
Hit rate
Hit time
Miss penalty
Cache Size
Block Size
Associativity E
L1 L2 8
L3 16
Write Strategy
Writing Cache-friendly Code
Make the common case go fast
Minimize the number of cache misses in each inner loop
Repeated references to local variables are good because the compiler can cache them in the register file (temporal locality)
Stride-1 reference patterns are good because caches at all levels of the memory hierarchy store data
as contiguous blocks (spatial locality)
The Impact of Caches on Program Performance
The Memory Mountain
Part II
Running Programs on a System
Linking
【Better Explained】
Time of the Linking
Compile time
Load time
Run time
Separate compilation
Benefits of Understanding Linking
Build large programs
Avoid dangerous programming errors
Understand how language scoping rules are implemented
Understand other important systems concepts
Exploit shared libraries
Linking
Symbol resolution
C++ Java Mangling demangling
Multiply Defined Global Symbols
Uninitialized global variables get weak symbols.
Functions and initialized global variables get strong symbols.
Rule 1: Multiple strong symbols are not allowed.
Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol.
Rule 3: Given multiple weak symbols, choose any of the weak symbols.
Resolve References with Static Libraries
U
E
D
Relocation
Relocating sections and symbol definitions
The linker then assigns run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules.
Relocating symbol references within sections
Relocation Entries
relocation types
R 386 PC32
R 386 32
Relocating PC-Relative References
Relocating Absolute References
Static Linking
Linking with Static Libraries
The general rule for libraries is to place them at the end of the command line.
On the other hand, if the libraries are not independent, then they must be ordered
Libraries can be repeated on the command line if necessary to satisfy the dependence requirements.
Dynamic Linking
Shared libraries
.interp
JNI
Linux API
PIC
GOT
.data
PLT
.test
PIC Data References
performance disadvantages
PIC Function Calls
lazy binding
Loading and Linking Shared Libraries from Applications
Life in real world
Distribution Software
Building high-performance web server
Object Files
Tools for Manipulating Object Files
objdump
readelf
ar
strings
strip
nm
size
ldd
Object file formats
a.out
COFF
PE
ELF
Relocatable object file
Typical ELF relocatable object file.png
symbol table
Global symbols defined by module that located file in C or Class in C++ Java
Not include static global variables
Global symbols defined by some other module
Local symbols
Only Include static global variables and functions. not local variables
Executable object file
Typical ELF executable object file
objdump -dx
execve loader _start
Linux run-time memory image.png
Shared object file
Exceptional Control Flow
【Better Explained】
control transfer
control flow of the processor.
“smooth” sequence
ECF
Hardware level
operating systems level
application level
Benefits of understand ECF
understand important systems concepts
understand how applications interact with the operating system
write interesting new application programs
understand concurrency
understand how software exceptions work
Tools for Manipulating Processes
strace
ps
top
pmap
/proc
Exceptions
【Better Explained】
An exception is an abrupt change in the control flow in response to some change in the processor’s state.
processor’s state
event
related to current instruction
unrelated to current instruction
The differences to Procedure call
Return
Procedure call,pop return address from the stack
Depending on the class of exception
Exception handlers run in kernel mode
Push some additional processor state onto the stack,EFLAGS
If control is being transferred from a user program to the kernel, all of these items are pushed onto the kernel’s stack rather than onto the user’s stack.
It looks like meaning that exception occurs in user mode rather than kernel mode
Asynchronous exceptions occur as a result of events in I/O devices that are external to the processor
Synchronous exceptions occur as a direct result of executing an instruction
system-level functions
system calls
The standard C library
Exception Handling
exception number
exception table
exception table base register
exception handler
Exception return
depending on the type of event that caused the exception
Icurr
Inext
Abort
Classes of Exceptions
Interrupts
Signal from I/O device
Async
Handler returns to next instruction
Traps
Intentional exception
Sync
Handler returns to instruction following the syscall
Faults
Potentially recoverable error
Sync
Handler either reexecutes current instruction or aborts.
Aborts
Nonrecoverable error
Sync
Handler returns to abort routine
Exceptions in Linux/IA32 Systems
exception types
0 to 31 exceptions defined by the Intel architects
0 Divide error
Fault
13 General protection fault
Segmentation faults
Fault
14 Page fault
Fault
18 Machine check
Abort
32 to 255 interrupts and traps defined by Linux
128 0x80 System call
Trap
Linux/IA32 System Calls
By convention, %eax %ebx, %ecx, %edx, %esi,%edi, and %ebp
Processes
Context
program’s codeand data stored in memory
the contents of its general-purpose registers
the program counter
status registers
the floating-point registers
user’s stack
kernel’s stack
environment variables
various kernel data structures
page table
process table
file table
abstractions
Logical control flow
A sequence of PC values
Corresponded exclusively to instructions contained in our program’s executable object file or in shared objects linked into our program dynamically at run time.
Instances
Exception handlers, processes, signal handlers, threads, and Java processes
Concurrent Flows
A logical flow whose execution overlaps in time with another flow
concurrency
The general phenomenon of multiple flows executing concurrently
multitasking / Time slicing
The notion of a process taking turns with other processes
time slice
Each time period that a process executes a portion of its flow
parallel flows
If two flows are running concurrently on different processor cores or computers
Private address space
Private
Can not be accessed by other process
0x08048000
0x00400000
User and Kernel Modes
Processors typically provide this capability with a mode bit in some control register that characterizes the privileges that the process currently enjoys.
/proc
exports the contents of many kernel data structures as a hierarchy of text files
/sys
exports additional low-level information about system buses and devices
Process scheduling
At certain points during the execution of a process, the kernel can decide to preempt the current process and restart a previously preempted process
Context Switches
(1) saves the context of the current process
(2) restores the saved context of some previously preempted process
(3) passes control to this newly restored process
Process Control
Obtaining Process IDs
Creating and Terminating Processes
process states
Running.
Stopped
Terminated
fork
Call once, return twice
Concurrent execution
Duplicate but separate address spaces
Shared files
process hierarchy
Reaping Child Processes
zombie
waitpid
Determining the Members of the Wait Set
pid > 0
pid = -1
Modifying the Default Behavior
0
WNOHANG
WUNTRACED
Checking the Exit Status of a Reaped Child
WIFEXITED(status)
WEXITSTATUS(status)
WIFSIGNALED(status)
WTERMSIG(status)
WIFSTOPPED(status)
WSTOPSIG(status)
Error Conditions
-1
ECHILD
EINTR
Putting Processes to Sleep
sleep
0
>0
pause
Loading and Running Programs
execve
called once and never returns or -1
Typical organization of the user stack when a new program starts.png
getenv
setenv
unsetenv
Using fork and execve to Run Programs
shell
The read step reads a command line from the user
The evaluate step parses the command line and runs programs on behalf of the user.
Web servers
Process Groups
Every process belongs to exactly one process group
getpgrp
setpgid
Signals
Linux signals
A signal is a message that notifies a process that an event of some type has occurred in the system.
Default action
Dumping core
Years ago, main memory was implemented with a technology known as core memory.
an historical term that means writing an image of the code and data memory segments to disk.
SIGTRAP
SIGABRT
SIGFPE
SIGSEGV
ignore
SIGCHLD
SIGCONT
SIGURG
SIGWINCH
stop until next SIGCONT
SIGSTOP
SIGTSTP
SIGTTIN
SIGTTOU
neither be caught nor ignored
terminate*
SIGKILL
stop until next SIGCONT*
SIGSTOP
Each signal type corresponds to some kind of system event. Low-level hardware exceptions are processed by the kernel’s exception handlers and would not normally be visible to user processes. Signals provide a mechanism for exposing the occurrence of such exceptions to user processes.
Signal Terminology
Sending a signal
The kernel sends (delivers) a signal to a destination process by updating some state in the context of the destination process.
delivered reasons
(1) the kernel has detected a system event
a divide-by-zero error
the termination of a child process
(2) A process has invoked the kill function
A process can send a signal to itself.
How to send
the kill Program
the Keyboard
the kill Function
the alarm Function
Receiving a signal
A destination process receives a signal when it is forced by the kernel to react in some way to the delivery of the signal.
pending signal
A signal that has been sent but not yet received
Signal Handling Issues
Pending signals can be blocked
Unix signal handlers typically block pending signals of the type currently being processed by the handler
Pending signals are not queued
The crucial lesson is that signals cannot be used to count the occurrence of events in other processes.
System calls can be interrupted
Portable Signal Handling
sigaction
Only signals of the type currently being processed by the handler are blocked
As with all signal implementations, signals are not queued
Once the signal handler is installed, it remains installed
Interrupted system calls are automatically restarted whenever possible
Race
A good method to expose your race in code
Coin of sleep
Nonlocal Jumps
An important application of nonlocal jumps is to permit an immediate return from a deeply nested function call, usually as a result of detecting some error condition
Another important application of nonlocal jumps is to branch out of a signal handler to a specific code location, rather than returning to the instruction that was interrupted by the arrival of the signal.
It uses main memory efficiently by treating it as a cache for an address space stored on disk, keeping only the active areas in main memory, and transferring data back and forth between disk and memory as needed
It simplifies memory management by providing each process
with a uniform address space
It protects the address space of each process from corruption by other processes
Benefits of understand Virtual Memory
Virtual memory is central
Virtual memory is powerful
Virtual memory is dangerous
Address Spaces
LAS
VAS
PAS
Core i7/Linux Memory System
Core i7 Address Translation
Linux Virtual Memory System
Linux Virtual Memory Areas(also called segments)
An area is a contiguous chunk of existing (allocated) virtual memory whose pages are related in some way
task_struct
mm_struct
pgd
vm_area_structs
vm_start
vm_end
vm_prot
vm_flags
vm_next
Linux Page Fault Exception Handling
Is virtual address legal?
segmentation fault:
accessing a non-existing page
Is the attempted memory access legal?
protection exception:
normal page fault
Memory Mapping
【Better explained】
initializes the contents of a virtual memory area by associating it with an object on disk
swap file/space/area,maintained by the kernel.
A object mapped into physical memory that the physical pages are not necessarily contiguous
A virtual memory area that a shared object is mapped into is often called a shared area. Similarly for a private area
Mapping object
Regular file in the Unix filesystem
Anonymous file
created by the kernel, that contains all binary zeros
no data is actually transferred between disk and memory
demand-zero pages
Shared Objects
shared object
It is visible to any other processes that have also mapped the shared object into their virtual memory
The changes are also reflected in the original object on disk
private object
not visible to other processes
not reflected back to the object on disk
private copy-on-write
The fork Function
The execve Function
Delete existing user areas
Map private areas
Map shared areas
Set PC
User-level Memory Mapping
Dynamic Memory Allocation
brk ptr
Explicit allocators
malloc calloc realloc free
sbrk
Why Dynamic Memory Allocation?
Often we do not know the sizes of certain data structures until the program actually runs
Allocator Requirements and Goals
Requirements
Handling arbitrary request sequences
Making immediate responses to requests
Using only the heap
Aligning blocks (alignment requirement)
The allocator must align blocks in such a way that they can hold any type of data object. On most systems, this means that the block returned by the allocator is aligned on an eight-byte (double-word) boundary.
Not modifying allocated blocks
Goals
【Better Explained】
aggregate payload
peak utilization
Maximizing throughput
throughput is defined as the number of requests that it completes per unit time
Maximizing memory utilization
In fact, the total amount of virtual memory allocated by all of the processes in a system is limited by the amount of swap space on disk
Fragmentation
internal fragmentation
external fragmentation
Implementation
Free block organization
Implicit Free Lists
block
header
payload
padding
Explicit Free Lists
Simple Segregated Storage
Placement
first fit
next fit
best fit
Segregated Fits
Buddy Systems
Splitting
Coalescing
false fragmentation
immediate coalescing
deferred coalescing
Coalescing with Boundary Tags
footer
Getting Additional Heap Memory
sbrk
Implicit allocators /Garbage collectors
Garbage Collector Basics
directed reachability graph
a set of root nodes
a set of heap nodes
conservative garbage collectors
Mark&Sweep Garbage Collectors
Conservative Mark&Sweep for C Programs
Common Memory-Related Bugs in C
Dereferencing Bad Pointers
Reading Uninitialized Memory
Allowing Stack Buffer Overflows
Assuming that Pointers and the Objects they Point to Are the Same Size
Making Off-by-One Errors
Referencing a Pointer Instead of the Object it Points To
Misunderstanding Pointer Arithmetic
Referencing Nonexistent Variables
Referencing Data in Free Heap Blocks
Introducing Memory Leaks
Physical and Virtual Addressing
Memory management unit (MMU)
Page table base register (PTBR)
Address Translation
Address translation with a page table.png
Page hit.png
Page fault.png
Integrating Caches and VM.png
Translation lookaside buffer (TLB)
Components of a virtual address that are used to access the TLB.png
Multi-level Page Tables
This scheme reduces memory requirements
If a PTE in the level-1 table is null that represents a significant potential savings
only the level-1 table and the most heavily used level-2 page tables need to be cached in main memory
Virtual address (VA)
Virtual page offset (VPO)
Virtual page number (VPN)
Physical address (PA)
Physical page number (PPN)
Physical page offset (PPO)
The Role of VM
VM as a Tool for Caching
Page
virtual page (VP)
Unallocated
Cached
Uncached
physical page (PP)page frames
DRAM SRAM
Page Tables
page table entry (PTE)
Page Hits
Page Faults
swapping or paging
swapped in (paged in)
swapped out (paged out)
demand paging
Allocating Pages
Locality
working set or resident set
thrashing
VM as a Tool for Memory Management
demand paging and separate virtual address spaces
Simplifying sharing
Simplifying loading
Simplifying linking
Simplifying memory allocation
VM as a Tool for Memory Protection
Permission bits
SUP
READ
WRITE
segmentation fault
more bits for other process access
Part III
Interaction and Communication Between Programs
System-Level I/O
【Better Explained】
Benefits of understand Unix I/O
understand other systems concepts
Sometimes you have no choice but to use Unix I/O
there are problems with the standard I/O library that make it risky to use for network programming.
Unix file is a sequence of some bytes
All I/O devices, such as networks, disks, and terminals, are modeled as files
This elegant mapping allows kernel to export a simple, low-level API that enables all input and output to be performed in a uniform and consistent way
R I/O
W.Richard Stevens
Unix I/O
Opening and Closing Files
descriptor
Closing files
free the data structures
restore the descriptor
Reading and Writing Files
file position
EOF
There is no explicit “EOF character” at the end of a file
read
0 EOF
Reading File Metadata
stat
Sharing Files
Descriptor table
File table
v-node table
I/O Redirection
dup2
Standard I/O
Stream is a pointer a structure of type FILE
Full duplex
Restriction 1:Input functions following output functions.
Restriction 2:Output functions following input functions
?
Network Programming
The Client-Server Programming Model
Networks
WAN
Router
LAN
Bridged Ethernet
Some twisted pairs of wires
Multiple Ethernet segments
Ethernet segment
Some twisted pairs of wires
Hub
Bridge
Internet Protocol
Naming scheme
48-bit address
internet addresses
Delivery mechanism
Frame
Header
Playload
Encapsulation
The Global IP Internet
IP Addresses
scalar IP address in a structure
dotted-decimal representation
network byte order
inet_aton
inet_ntoa
Internet Domain Names
ICANN
Internet domain name hierarchy
HOSTS.txt
DNS
gethostbyname
gethostbyaddr
Internet Connections
Socket Address Structures
sockaddr
SA
Steven
sockaddr_in
in_addr
The Sockets Interface
socket
bind
Web Servers
Web Basics
HTTP
HTML
Hyperlinks
WWW
Tim Berners-Lee
MIME
URL
?
&
Web Content
Fetch a disk file and return its contents to the client
static content
serving static content
Run an executable file and return its output to the client
dynamic content
serving dynamic content
/
HTTP Transactions
Concurrent Programming
【Better Explained】
Application-level concurrency
Accessing slow I/O devices
Interacting with humans
Reducing latency by deferring work
Servicing multiple network clients
Computing in parallel on multi-core machines
concurrent programs
Processes
each logical control flow is a process that is scheduled and maintained by the kernel
Since processes have separate virtual address spaces, communicating with each other flows by IPC
I/O multiplexing
applications explicitly schedule their own logical flows in the context of a single process
Since the program is a single process, all flows share the same address space
Logical flows are modeled as state machines that the main program explicitly transitions from state to state as a result of data arriving on file descriptors.
Threads
Threads are logical flows that run in the context of a single process and are scheduled by the kernel.
scheduled by the kernel like process flows
sharing the same virtual address space like I/O multiplexing flows
Concurrent ProgrammingWith Processes
Pros and Cons of Processes
file tables are shared
separate address spaces for processes
Avoiding accidentally overite the virtual memory of another process
make it more difficult for processes to share state information
Slower because the overhead for process control and IPC is high.
Traditional view of a process
Process context
Data registers
Condition codes
Stack pointer (SP)
Program counter (PC)
Kernel context
Process ID (PID)
VM structures
Open files
Signal handlers
brk pointer
Code, data, and stack
stack
shared libraries
run-time heap
read/write data
read-only code/data
Concurrent ProgrammingWith I/O Multiplexing
Pros and Cons of I/O Multiplexing
event-driven designs give programmers more control over the behavior of their programs than process-based designs
I/O multiplexing runs in the context of a single process
do not require a process context switch to schedule a new flow
coding complexity
Concurrent ProgrammingWith Threads
【Better Explained】
main thread
peer thread
pool of peers
The main impact of this notion of a pool of peers is that a thread can kill any of its peers, or wait for any of its peers to terminate.
thread routine
Posix Threads
A joinable thread like a zombie process
A joinable thread can be reaped and killed by other
threads. Its memory resources (such as the stack) are not freed until it is reaped by another thread
A detached thread cannot be reaped or killed by other threads. Its memory resources are freed
automatically by the system when it terminates.
Mapping Variables to Memory
Global variables
Local automatic variables
Local static variables
Threads Memory Model
Thread 1 context:
Data registers
general-purpose registers
Condition codes
stack 1
On my view ,the stack is the thread routine call stack,in other worlds ,it is just a ordinary function call stack.
????
SP1
PC1
TID1
Thread 2 context:
Data registers
general-purpose registers
Condition codes
stack 2
SP2
PC2
TID2
Shared code and data
shared libraries
run-time heap
read/write data
read-only code/data
Kernel context:
VM structures
Open files
Signal handlers
brk pointer
PID
Synchronizing Threads with Semaphores
Sequential Consistency
Synchronization error
Progress Graphs
critical section
safe trajectory
unsafe trajectory
Semaphores
Edsger Dijkstra
Proberen
Verhogen
Semaphore invariant
Binary semaphore
Mutex
Counter semaphore
Schedule Shared Resources
Forbidden region
Producer-consumer model
Using Threads for Parallelism
Strong scaling
Weak scaling
Thread-unsafe functions
Failing to protect shared variables
Semaphore
it does not require any changes in the calling program
the additional synchronization will slow down the program
Relying on state across multiple function invocations
The only way to make a function such as rand thread-safe is to rewrite it
Returning a pointer to a static variable
Rewrite it.
Lock-and -copy
the additional synchronization will slow down the program
Such as gethostbyname return a pointer to a complex structure,if we want to copy the entire Hierarchy ,we need deep copy.
In terms of the class of multiple invocations functions ,it doesn't work.
Calling thread-unsafe functions
Functions relying on multiple function invocations will be unsafe
Functions that Involved shared variables and static variable can make it safe by using semaphore and lock-and-copy
Reentrancy
Reentrant function
Reentrant functions are characterized by the property that they do not reference any shared data when they are called by multiple threads
Reentrant functions are typically more efficient than non-reentrant
thread-safe functions because they require no synchronization operations
explicitly reentrant
all function arguments are passed by value (i.e., no pointers)
all data references are to local automatic stack variables (i.e., no references to static or global variables)
Implicity reentrant
Function arguments can be passed by reference (that is, we allow them to pass pointers)
the calling threads must be careful to pass arguments that pointers to non-shared data
Library Functions in Threaded Programs
Rewrite
Lock-and-copy-deep copy
Race
A race occurs when the correctness of a program depends on one thread reaching point x in its control flow before another thread reaches point y
Golden rule
Threaded programs must work correctly for any feasible trajectory
Deadlock
a collection of threads are blocked, waiting for a condition that will never be true
Consistent lock will avoid the deadlock if you use mutex