-
Flow
-
Client Manager
-
request
- check authentication
- process available
- check database loading
- wait for required resource
- send query to query manager
-
response
- stores the partial results in buffer and start sending
- stops connection, give a readable explanation
- release resource
- Query Manager
-
Query Parser
- check syntax
- check keywords are used in right order
- table exists
- fields of table exists
- operations of the types of the fields are possible
- check user authorization to read/write
- transformed into an internal representation(tree)
-
Query Rewriter
-
pre-optimize the query
-
rules
- view merging
- subquery flattening
- removal of unnecessary operators
- redundant join elimination
- constant arithmetic evaluation
- partition pruning
- materialized view rewrite
- custom rule
- olap transformation
- avoid unnecessary operations
- help the optimizer to find the best possible solution
-
Query Optimizer
-
statistic
-
basic
- minimum unit
- a page
- 4 or 8 kilobytes
- values
- number of rows/pages in a table
- column
- distinct data values
- length of data values
- min
- max
- average
- data range information
- min
- max
- average
- information on the indexes of the table
-
advanced
- histogram
- most frequent value
- quantiles
-
Cost Based Optimization
-
estimate
- memory usages
- CPU
- CPU Architecture
- disk I/O
-
factor
- Indexes
- B+Tree
- Bitmap
- dynamically temporary indexes
- etc...
- Access Path
- Full Scan
- Range Scan
- Unique Scan
- Access by Row Id
- others
- Join Operators
- Relation
- inner relation
- right data set
- outer relation
- left data set
- table
- index
- intermediate result
- Merge Join
- produced a sorted result
- Complexity
- sorted
- O(N+M)
- has to be sorted
- O(N*Log(N) + M*Log(M))
- Hash Join
- O(N+M)
- enough memory
- Nested Join
- O(N*M)
- the inner relation must be the smallest one
-
Choose
- Most of the time an optimizer doesn't find the best solution but a "good" one
- Dynamic Programming
- Greedy Algorithm
-
Heuristic
- genetic algorithm
-
Query Plan Cache
- store the plan to avoid use-less re-computations of the same query
-
Query Executer
- executed compiled plan when there is enough resource
-
Data Manager
-
Problem
-
can't get data at any time
- ACID
- get and keep data in memory buffers
-
Cache Manager
- Buffer Pool
-
Prefetching
- cache manager needs to get the data in memory before the query executor use them
-
strategy
- speculative prefetching
- sequential prefecthing
-
Purging data
-
Buffer-Replacement strategies
- LRU(Least Recently Used)
- Improvement
- LRU-K
- a weight is put on the number of times the data was used
- Other Algorithms
- 2Q
- CLOCK
- MRU
- LRFU
- Least Recently and Frequently Used
-
Transaction Manager
-
ACID
-
Atomicity
- all or nothing
-
Isolation
- transaction A and B run at the same time, A and B must be the same whether A finishes before/after/during B
- levels
- Serializable
- Two Transactions have its own world
- Repeatable read
- phantom read
- break of isolation between transactions is only about new data, not the existing ones
- Read committed
- non-repeatable read
- if a transaction A readd a data D and then this data is modified(or deleted) and committed by a transaction B, if A reads data D again it will see the modification(or deletion) made by B on the data
- Read Uncommitted
- Dirty Read
- If a transaction A reads a data D and then this data D is modified by a transaction B (that is not committed and still running), if A reads data D again it will see the modified value. If transaction B is rolled back, then data D read by A the second time doesn’t make no sense since it has been modified by a transaction B that never happened (since it was rolled back).
-
Durability
- one the transaction is committed, the data stay in the database no matter what happen
-
Consistency
- only valid data are written to the database
-
Concurrency Control
-
write operations on the same data
- if (at least) one of the transactions is modifying a data read by other transactions, the database needs to find a way to hide this modification from the other transactions. Moreover, it also needs to ensure that this modification won’t be erased by another transaction that didn’t see the modified data.
-
solution
- concept
- take into account that a transaction can be cancelled
- execute the conflicting parts in a certain order
- reorder the operations inside the conflicting transactions to reduce the size of the conflicting parts
- check if the parts of 2 (or more) transactions are in conflict because they’re reading/modifying the same data.
- monitor all the operations of all the transactions
- every time a transaction is created or cancelled
- method
- Lock Manager
- Pessimistic locking
- shared lock
- if a transaction needs only to read a data A
- it “shared locks” the data and reads the data
- if a second transaction also needs only to read data A
- it “shared locks” the data and reads the data
- if a third transaction needs to modify data A
- it “exclusive locks” the data but it has to wait until the 2 other transactions release their shared locks to apply its exclusive lock on data A
- exclusive lock
- flow
- it’ll have to wait until the first transaction releases the data
- if another transaction also needs this data
- it locks the data
- if a transaction needs a data
- cons
- it forces other transactions that only want to read the same data to wait
- Deadlock
- solution
- timeout
- Two-phase Locking
- phases
- growing phase
- a transaction can obtain locks, but can’t release any lock
- shrinking phase
- a transaction can release locks (on the data it has already processed and won’t process again), but can’t obtain new locks
- rules
- to release the locks that aren’t used anymore to reduce the wait time of other transactions waiting for these locks
- to prevent from cases where a transaction gets data modified after the transaction started and therefore aren’t coherent with the first data the transaction acquired.
- all the exclusive locks must be released at the end of the transaction.
- Data Versioning
- idea
- every transaction can modify the same data at the same time
- each transaction has its own copy (or version) of the data
- if 2 transactions modify the same data, only one modification will be accepted, the other will be refused and the associated transaction will be rolled back (and maybe re-run).
- pros
- reader transactions don’t block writer transactions
- writer transactions don’t block reader transactions
- there is no overhead from the “fat and slow” lock manager
- example
- MVCC
- Multiversion Concurrency Control
-
Log Manager
-
Log record
-
composition
- each page on disk (that stores the data, not the log) has id of the log record (LSN) of the last operation that modified the data.
- Type (ARIES Log)
- UndoNxtLSN (ARIES Log)
- REDO
- a way replay the operation
- UNDO
- a way to remove the effect of the operation
- PrevLSN
- A link to the previous log record produced by the same transaction
- PageID
- the location on disk of the modified data
- TransID
- the id of the transaction that produced the operation
- LSN
- Log Sequence Number
-
example
-
WAL
-
Write-Ahead Logging Protocol
- rules
- Each modification into the database produces a log record, and the log record must be written into the transaction log before the data is written on disk
- The log records must be written in order; a log record A that happens before a log record B must but written before B
- When a transaction is committed, the commit order must be written on the transaction log before the transaction ends up successfully
-
enhanced version of WAL
- ARIES
- Algorithms for Recovery and Isolation Exploiting Semantics
- aims
- good performance when writing logs
- having a fast and reliable recovery
-
reasons of rollback transaction
- user cancelled it
- server or network failure
- the transaction has broken the integrity of the database
- deadlocks
-
Log Buffer
- avoid that log writing becomes a major bottleneck
-
Flow
- query executor asks for a modification
- The cache manager stores the modification in its buffer
- The log manager stores the associated log in its buffer
- At this step, the query executor considers the operation is done (and therefore can ask for other modifications)
- Then (later) the log manager writes the log on the transaction log. The decision when to write the log is done by an algorithm
- Then (later) the cache manager writes the modification on disk. The decision when to write data on disk is done by an algorithm
- Policies
- type
- NO-STEAL
- buffer manager needs to wait until the commit order to write everything at once
- STEAL
- the data are written step-by-step on disk
- Force
- must be done before the commit
- No-Force
- might be done after the commit
- because in case of crashes it’s still possible to recover the transaction with the REDO logs
- impact on recovery
- STEAL/No-Force
- UNOD
- REDO
- highest performance
- more complex logs and recovery processes
- STEAL/FORCE
- UNDO
- NO-STEAL/NO-Force
- REDO
- NO-STEAL/FORCE
- nothing
- worst performance
- huge amount of ram
-
Recovery
-
crash
- The Undo Pass
- rolls back all transactions that were incomplete at the time of the crash
- starts with the last logs of each transaction and processes the UNDO logs in an anti-chronological order (using the PrevLSN of the log records).
- The Redo pass
- uses the REDO to update the database to the state it was before the crash
- REDO logs are processed in a chronological order (using the LSN)
- LSN(page_on_disk)>=LSN(log_record)
- the data has already been written on disk before the crash
- LSN(page_on_disk)<LSN(log_record)
- the page on disk is updated
- The analysis pass
- recreate the timeline
- reads the full transaction log
- which transaction to rollback
- which data needed to be written on disk
-
normally
- reason
- manually
- lock manager
- network failure
- UNDO and REDO in 2 in memory table
- transaction table
- stores the state of all current transactions
- dirty page table
- stores which data need to be written on disk
- Data Access Manager
-
SQL
- inteval 30 minute
- ALTER TABLE tablename AUTO_INCREMENT = 1
-
盡量不用 select *
- 多一個欄位,時間差六倍
-
avoid or
-
使用 in 取代 or
-
or
- O(n)
-
in
- O(log n)
- in個數 < 200
-
使用 union 取代 or
- merge index 往往很弱智
-
example
- Select * from opp WHERE phone='010-88886666' or cellPhone='13800138000';
- Select * from opp WHERE phone='010-88886666' union Select * from opp WHERE cellPhone='13800138000';
- 少用 count(*)
-
統計
-
即時
- 用 cache, 雙向更新, 離峰跑基底
-
非即時
- 使用單獨表,定期重算
-
Limit
-
Limit 10000,10
- 偏移量越大越慢
-
optimization
- SELECT id from <table> WHERE id > 10000 LIMIT 10;
-
高效分頁
- select * from post WHERE id >= ( select sql_no_cache id from post limit 80000,1 ) limit 10 ;
-
Union
-
UNION ALL
- 不去重覆
-
UNION
- 去重覆
-
Join
- 高 concurrent DB 不 JOIN 兩個以上 table
- JOIN fields should be the same data type
-
example
- MySQL> Select * from tag JOIN tag_post on tag_post.tag_id=tag.id JOIN post on tag_post.post_id=post.id WHERE tag.tag=‘二手玩具’;
- MySQL> Select * from tag WHERE tag=‘二手玩具’;
MySQL> Select * from tag_post WHERE tag_id=1321;
MySQL> Select * from post WHERE post.id in (123,456,314,141)
-
Group By
- 分組
-
自動排序
-
去除排序
- order by null
- 數字對數字,字串對字串
-
Load Data
- no index faster than has index
- load data is 20 times faster than insert
- 盡量不用 insert ... select
- 使用 join 改寫子查詢
-
Slow Query
-
index useful
- 不在索引列做運算
-
Like
- 避免前綴 %
- 避免負向查詢
- avoid type conversion
- 是否使用最優的 Join 表
-
Tools
- Inception SQL
- SysSchema
- Workbench
-
NoSQL
-
BASE
- Basic Availability
- Soft state
- Eventual consistency
- 2 phase commit
-
flexible schema
- TB 級數據
- zero downtime deployment
- sharding
-
MySQL
-
VIEW
-
Algorithms
- Merge
- template
- undefined
-
driver
-
mysqlnd
- mysql native driver
- let your int field to be int in php
-
table
- name by lower case
-
field
- separated by _
- ENUM and SET
- 避免使用 Null
- 少用並拆分 Text/Blob
-
盡量不用 foreign key
- 由程式限制
-
primary key
- unsigned integer
- avoid reserved words
-
option
- row format
-
collate
- utf8_general_ci
- utf8_unicode_ci
-
charset
- UTF8
- utf8mb4
- emoji
-
index
-
盡量不用 foreign key
- 由程式限制
- uniq_[field name]
- idx_[field name]
- MySQL will create indexes automatically even for foreign keys.
-
leftmost prefix
- (col1, col2, col3)
- col1
- col1, col2
- col1, col2, col3
-
data
-
data types
- string
- CHAR
- fix length
- faster query
- VARCAHR
- indeterminate length
- mobile phone
- won't be calculated
- could be fuzzy search
- like '%324%'
- TEXT
- ENUM
- number
- tinyint
- 1 bit
- int
- 4 bit
- bigint
- facebook id
- 8 bit
- date
- DATETIME
- '1000-01-01' to '9999-12-31'
- save in current timezone
- TIMESTAMP
- '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC
- save UTC in storage, change to connection's timezone for retrieval
- YEAR
- DATE
- IP
- Unsigned Integer
- INET_ATON
- address to number
- INET_NTOA
- number to address
- save raw data
-
Engine
-
innodb
- row lock
- ACID
-
myISAM
- table lock
-
Sql
- orderby 無法使用索引,則使用 filesort
-
Replication
-
slave 無法知道是否與 master 同步
-
solution
- Semi-Sync
- master 事務提交需要 slave ack
- 網路超時後備庫降級為異步復制
- 並沒有解決異步復制的根本缺陷
- 網路回復後需追趕日誌,漏十秒可能需要十分鐘的追趕
- GTID
-
多點寫入
-
garara
- 影響效能
- 多引擎
-
Prohibition
- store procedure
- view
- trigger
- event
-
Analysis
- show profile;
- mysqlsla;
- mysqldumpslow;
- explain;
- show slow log;
- show processlist;
- show query_response_time(percona);
-
systemtap
- iotime.stp
-
component
-
Core component
-
Security Manager
- authentication
-
File System Manager
- Disk I/O
-
Client Manager
- managing client connection
-
Process Manager
- pool of processes/threads
-
Network Manager
- network I/O
-
Memory Manager
- efficient memory manager for handling a large amount of memory
-
Query Component
-
Query Parser
- check query validation
-
Query rewriter
- pre-optimize a query
-
Query Optimizer
- optimize a query
-
Query Executer
- compile and execute a query
-
Data Manager
-
Transaction Manager
- handle transactions
-
Cache Manager
- put data in memroy before using
- put data in memory before writing them on disk
-
Access Manager
- access data on disk
-
Tools
-
Backup Manager
- saving and restoring a database
-
Recovery Manager
- restarting the database in a coherent state after a crash
-
Monitor Manager
- logging the activity
- monitor tools
-
Administrator Manager
-
storing metadata
- table schema
- tools to manage databases, schemas, tablespaces
-
Basic
-
ACID
-
Atomicity
- Transaction 為單位
- RDBMS 以 Transaction 為單位進行修復
- 保證系統數據從一個正確狀態(consistence state) 到下一個正確狀態
-
Consistency
- unique constraint
- foreign key
- 保證系統在移動失敗時,能安全地回到本來的正確狀態
-
Isolation
- 同一筆資料,不會同時被兩個 Transaction 改動
- 避免 race condition
- deadlock detect
-
Durability
- 一旦 commited 資料改動,除非存儲空間受損,否則永不流失
- C10K Problem
-
Stored Procedure
- sp
- REDO Log
-
更改 Schema
-
使用 CTAS 取代 alter table
- Crate Table As Select ...
- 小心 foreign key, table privilege, table setting
- 使用 Trigger 同步新建的 table
-
Backup
- disaster recovery
-
Replication
- Availability
-
Scale Up
-
partition
-
Horizontal
-
criteria
- range
- list
- composite
- round-robin
- hash
-
shard
- save partition tables in separate instances
-
Vertical
- split by columns
-
Reference
- http://coding-geek.com/how-databases-work/
- http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf