One of the transactions can succeed or fail.
No 3rd status
Consistency
In Data
Defined by the user
Referential integrity ( foreign keys )
In Reads
After commit data. get should return exactly data committed
Isolation
Read Uncommitted
Dirty Read
Read Committed
Repeatable Read
Phantom Reads
SnapShot
Serializable
Read phenomena
Durability
Changes made by committed transactions must be persisted in durable non volatile storage
WAL - Write ahead log before do it. when crash have log and cannot lost data.
Store data after computing not store event ( query )
expensive because when store data are indexes, data, files, ... when have change
that why dbms using...
write-ahead-log segments
only log version change
Asynchronous snapshot
Lostable data but high performance than wal
AOF ( Redis using ) record event ( query ) and will trigger it when crash
When write data to disk OS will be write to cache after that plush to disk. we can be lost data when system crash
using fsync ( force sync ) to force write to disk without cache but it will be slow
Can be config db engine ( innodb,... )
Default config mostly depend on os
Indexes
Clustered index
B-tree or HashMap has pointer to the Heap
Indexes store index of page and row
MYSQL InnoDB this is primary key
They will store all data in record
Other index not primary is non-clustered index
It has pointer to the clustered index ( MYSQL )
Better if set auto-increment because Data will be store ( order by ) this key.
very bad if insert value with id not increment
Non-clustered index
Only store value of column in index
pointer to clustered index
Need go to clustered index to get other data if needed
How to db choose index
Will be scan table if have a little data
will choose an index if the data on one side that is very small
Other will be using multiple index
Select two column have two index
Explain Query
Key vs non-Key column index
BitMap Index Scan
Use bitwise to calculate and | or
Bloom Filter
Index Scan
Table Scan
B+Tree
Btree
Storage
Heap
Page.
IO operation always fetch minimum a page (8/16kb) not fetch only single row
Column Base
Writes are slower
OLAP
Compress greatly
Amazing for aggregation
Inefficient query multi-columns
Row Base
Optimize for read/writes
OLTP
Compression isn't efficient
Aggregation isn't efficient
Efficient queries multi-columns
Primary key
Compare
Concurrency Control
Shared Lock
Exclusive Locks
Level locking
Row
Table
Page
Pessimistic
Locking start when start transaction. everyone else need wait for lock release and continues
Good when transactions often conflict
Optimistic
when start transaction they got number current version of transaction. when commit system will compare with current version if not change this transaction can be apply. but if change transaction need do again
Good when transactions are less conflict
Dead Locks
When two transactions lock each other
Two-phase Locking
Offset slow
Should using where > id because offset will fetch all data and filter not filter in storage engines
Partitioning
Split to multiple table same database server
Typpe of split
Horizontal ( Split rows )
Vertical ( Split columns )
By Range
By List
By Hash
Pros
Query faster when accessing single partition
Easy bulk loading ( attach partition )
Archive old data very easy
Cons
Change schema is hard
Ineffective when scan all data
Hard for update -> move data between partitions is hard