1. Flow
    1. Client Manager
      1. request
        1. check authentication
        2. process available
        3. check database loading
        4. wait for required resource
        5. send query to query manager
      2. response
        1. stores the partial results in buffer and start sending
        2. stops connection, give a readable explanation
        3. release resource
    2. Query Manager
    3. Query Parser
      1. check syntax
      2. check keywords are used in right order
      3. table exists
      4. fields of table exists
      5. operations of the types of the fields are possible
      6. check user authorization to read/write
      7. transformed into an internal representation(tree)
    4. Query Rewriter
      1. pre-optimize the query
        1. rules
          1. view merging
          2. subquery flattening
          3. removal of unnecessary operators
          4. redundant join elimination
          5. constant arithmetic evaluation
          6. partition pruning
          7. materialized view rewrite
          8. custom rule
          9. olap transformation
      2. avoid unnecessary operations
      3. help the optimizer to find the best possible solution
    5. Query Optimizer
      1. statistic
        1. basic
          1. minimum unit
          2. a page
          3. 4 or 8 kilobytes
          4. values
          5. number of rows/pages in a table
          6. column
          7. distinct data values
          8. length of data values
          9. min
          10. max
          11. average
          12. data range information
          13. min
          14. max
          15. average
          16. information on the indexes of the table
        2. advanced
          1. histogram
          2. most frequent value
          3. quantiles
      2. Cost Based Optimization
        1. estimate
          1. memory usages
          2. CPU
          3. CPU Architecture
          4. disk I/O
        2. factor
          1. Indexes
          2. B+Tree
          3. Bitmap
          4. dynamically temporary indexes
          5. etc...
          6. Access Path
          7. Full Scan
          8. Range Scan
          9. Unique Scan
          10. Access by Row Id
          11. others
          12. Join Operators
          13. Relation
          14. inner relation
          15. right data set
          16. outer relation
          17. left data set
          18. table
          19. index
          20. intermediate result
          21. Merge Join
          22. produced a sorted result
          23. Complexity
          24. sorted
          25. O(N+M)
          26. has to be sorted
          27. O(N*Log(N) + M*Log(M))
          28. Hash Join
          29. O(N+M)
          30. enough memory
          31. Nested Join
          32. O(N*M)
          33. the inner relation must be the smallest one
      3. Choose
        1. Most of the time an optimizer doesn't find the best solution but a "good" one
        2. Dynamic Programming
        3. Greedy Algorithm
        4. Heuristic
          1. genetic algorithm
      4. Query Plan Cache
        1. store the plan to avoid use-less re-computations of the same query
    6. Query Executer
      1. executed compiled plan when there is enough resource
    7. Data Manager
      1. Problem
        1. can't get data at any time
          1. ACID
        2. get and keep data in memory buffers
    8. Cache Manager
      1. Buffer Pool
      2. Prefetching
        1. cache manager needs to get the data in memory before the query executor use them
        2. strategy
          1. speculative prefetching
          2. sequential prefecthing
      3. Purging data
        1. Buffer-Replacement strategies
          1. LRU(Least Recently Used)
          2. Improvement
          3. LRU-K
          4. a weight is put on the number of times the data was used
          5. Other Algorithms
          6. 2Q
          7. CLOCK
          8. MRU
          9. LRFU
          10. Least Recently and Frequently Used
    9. Transaction Manager
      1. ACID
        1. Atomicity
          1. all or nothing
        2. Isolation
          1. transaction A and B run at the same time, A and B must be the same whether A finishes before/after/during B
          2. levels
          3. Serializable
          4. Two Transactions have its own world
          5. Repeatable read
          6. phantom read
          7. break of isolation between transactions is only about new data, not the existing ones
          8. Read committed
          9. non-repeatable read
          10. if a transaction A readd a data D and then this data is modified(or deleted) and committed by a transaction B, if A reads data D again it will see the modification(or deletion) made by B on the data
          11. Read Uncommitted
          12. Dirty Read
          13. If a transaction A reads a data D and then this data D is modified by a transaction B (that is not committed and still running), if A reads data D again it will see the modified value. If transaction B is rolled back, then data D read by A the second time doesn’t make no sense since it has been modified by a transaction B that never happened (since it was rolled back).
        3. Durability
          1. one the transaction is committed, the data stay in the database no matter what happen
        4. Consistency
          1. only valid data are written to the database
      2. Concurrency Control
        1. write operations on the same data
          1. if (at least) one of the transactions is modifying a data read by other transactions, the database needs to find a way to hide this modification from the other transactions. Moreover, it also needs to ensure that this modification won’t be erased by another transaction that didn’t see the modified data.
        2. solution
          1. concept
          2. take into account that a transaction can be cancelled
          3. execute the conflicting parts in a certain order
          4. reorder the operations inside the conflicting transactions to reduce the size of the conflicting parts
          5. check if the parts of 2 (or more) transactions are in conflict because they’re reading/modifying the same data.
          6. monitor all the operations of all the transactions
          7. every time a transaction is created or cancelled
          8. method
          9. Lock Manager
          10. Pessimistic locking
          11. shared lock
          12. if a transaction needs only to read a data A
          13. it “shared locks” the data and reads the data
          14. if a second transaction also needs only to read data A
          15. it “shared locks” the data and reads the data
          16. if a third transaction needs to modify data A
          17. it “exclusive locks” the data but it has to wait until the 2 other transactions release their shared locks to apply its exclusive lock on data A
          18. exclusive lock
          19. flow
          20. it’ll have to wait until the first transaction releases the data
          21. if another transaction also needs this data
          22. it locks the data
          23. if a transaction needs a data
          24. cons
          25. it forces other transactions that only want to read the same data to wait
          26. Deadlock
          27. solution
          28. timeout
          29. Two-phase Locking
          30. phases
          31. growing phase
          32. a transaction can obtain locks, but can’t release any lock
          33. shrinking phase
          34. a transaction can release locks (on the data it has already processed and won’t process again), but can’t obtain new locks
          35. rules
          36. to release the locks that aren’t used anymore to reduce the wait time of other transactions waiting for these locks
          37. to prevent from cases where a transaction gets data modified after the transaction started and therefore aren’t coherent with the first data the transaction acquired.
          38. all the exclusive locks must be released at the end of the transaction.
          39. Data Versioning
          40. idea
          41. every transaction can modify the same data at the same time
          42. each transaction has its own copy (or version) of the data
          43. if 2 transactions modify the same data, only one modification will be accepted, the other will be refused and the associated transaction will be rolled back (and maybe re-run).
          44. pros
          45. reader transactions don’t block writer transactions
          46. writer transactions don’t block reader transactions
          47. there is no overhead from the “fat and slow” lock manager
          48. example
          49. MVCC
          50. Multiversion Concurrency Control
    10. Log Manager
      1. Log record
        1. composition
          1. each page on disk (that stores the data, not the log) has id of the log record (LSN) of the last operation that modified the data.
          2. Type (ARIES Log)
          3. UndoNxtLSN (ARIES Log)
          4. REDO
          5. a way replay the operation
          6. UNDO
          7. a way to remove the effect of the operation
          8. PrevLSN
          9. A link to the previous log record produced by the same transaction
          10. PageID
          11. the location on disk of the modified data
          12. TransID
          13. the id of the transaction that produced the operation
          14. LSN
          15. Log Sequence Number
        2. example
      2. WAL
        1. Write-Ahead Logging Protocol
          1. rules
          2. Each modification into the database produces a log record, and the log record must be written into the transaction log before the data is written on disk
          3. The log records must be written in order; a log record A that happens before a log record B must but written before B
          4. When a transaction is committed, the commit order must be written on the transaction log before the transaction ends up successfully
        2. enhanced version of WAL
          1. ARIES
          2. Algorithms for Recovery and Isolation Exploiting Semantics
          3. aims
          4. good performance when writing logs
          5. having a fast and reliable recovery
      3. reasons of rollback transaction
        1. user cancelled it
        2. server or network failure
        3. the transaction has broken the integrity of the database
        4. deadlocks
      4. Log Buffer
        1. avoid that log writing becomes a major bottleneck
        2. Flow
          1. query executor asks for a modification
          2. The cache manager stores the modification in its buffer
          3. The log manager stores the associated log in its buffer
          4. At this step, the query executor considers the operation is done (and therefore can ask for other modifications)
          5. Then (later) the log manager writes the log on the transaction log. The decision when to write the log is done by an algorithm
          6. Then (later) the cache manager writes the modification on disk. The decision when to write data on disk is done by an algorithm
          7. Policies
          8. type
          9. NO-STEAL
          10. buffer manager needs to wait until the commit order to write everything at once
          11. STEAL
          12. the data are written step-by-step on disk
          13. Force
          14. must be done before the commit
          15. No-Force
          16. might be done after the commit
          17. because in case of crashes it’s still possible to recover the transaction with the REDO logs
          18. impact on recovery
          19. STEAL/No-Force
          20. UNOD
          21. REDO
          22. highest performance
          23. more complex logs and recovery processes
          24. STEAL/FORCE
          25. UNDO
          26. NO-STEAL/NO-Force
          27. REDO
          28. NO-STEAL/FORCE
          29. nothing
          30. worst performance
          31. huge amount of ram
      5. Recovery
        1. crash
          1. The Undo Pass
          2. rolls back all transactions that were incomplete at the time of the crash
          3. starts with the last logs of each transaction and processes the UNDO logs in an anti-chronological order (using the PrevLSN of the log records).
          4. The Redo pass
          5. uses the REDO to update the database to the state it was before the crash
          6. REDO logs are processed in a chronological order (using the LSN)
          7. LSN(page_on_disk)>=LSN(log_record)
          8. the data has already been written on disk before the crash
          9. LSN(page_on_disk)<LSN(log_record)
          10. the page on disk is updated
          11. The analysis pass
          12. recreate the timeline
          13. reads the full transaction log
          14. which transaction to rollback
          15. which data needed to be written on disk
        2. normally
          1. reason
          2. manually
          3. lock manager
          4. network failure
          5. UNDO and REDO in 2 in memory table
          6. transaction table
          7. stores the state of all current transactions
          8. dirty page table
          9. stores which data need to be written on disk
    11. Data Access Manager
  2. SQL
    1. inteval 30 minute
    2. ALTER TABLE tablename AUTO_INCREMENT = 1
    3. 盡量不用 select *
      1. 多一個欄位,時間差六倍
    4. avoid or
      1. 使用 in 取代 or
        1. or
          1. O(n)
        2. in
          1. O(log n)
          2. in個數 < 200
      2. 使用 union 取代 or
        1. merge index 往往很弱智
        2. example
          1. Select * from opp WHERE phone='010-88886666' or cellPhone='13800138000';
          2. Select * from opp WHERE phone='010-88886666' union Select * from opp WHERE cellPhone='13800138000';
    5. 少用 count(*)
    6. 統計
      1. 即時
        1. 用 cache, 雙向更新, 離峰跑基底
      2. 非即時
        1. 使用單獨表,定期重算
    7. Limit
      1. Limit 10000,10
        1. 偏移量越大越慢
        2. optimization
          1. SELECT id from <table> WHERE id > 10000 LIMIT 10;
      2. 高效分頁
        1. select * from post WHERE id >= ( select sql_no_cache id from post limit 80000,1 ) limit 10 ;
    8. Union
      1. UNION ALL
        1. 不去重覆
      2. UNION
        1. 去重覆
    9. Join
      1. 高 concurrent DB 不 JOIN 兩個以上 table
      2. JOIN fields should be the same data type
      3. example
        1. MySQL> Select * from tag JOIN tag_post on tag_post.tag_id=tag.id JOIN post on tag_post.post_id=post.id WHERE tag.tag=‘二手玩具’;
        2. MySQL> Select * from tag WHERE tag=‘二手玩具’; MySQL> Select * from tag_post WHERE tag_id=1321; MySQL> Select * from post WHERE post.id in (123,456,314,141)
    10. Group By
      1. 分組
      2. 自動排序
        1. 去除排序
          1. order by null
    11. 數字對數字,字串對字串
    12. Load Data
      1. no index faster than has index
      2. load data is 20 times faster than insert
      3. 盡量不用 insert ... select
    13. 使用 join 改寫子查詢
    14. Slow Query
      1. index useful
        1. 不在索引列做運算
        2. Like
          1. 避免前綴 %
        3. 避免負向查詢
        4. avoid type conversion
      2. 是否使用最優的 Join 表
      3. Tools
        1. Inception SQL
        2. SysSchema
        3. Workbench
  3. NoSQL
    1. BASE
      1. Basic Availability
      2. Soft state
      3. Eventual consistency
    2. 2 phase commit
    3. flexible schema
      1. TB 級數據
      2. zero downtime deployment
    4. sharding
  4. MySQL
    1. VIEW
      1. Algorithms
        1. Merge
        2. template
        3. undefined
    2. driver
      1. mysqlnd
        1. mysql native driver
        2. let your int field to be int in php
    3. table
      1. name by lower case
      2. field
        1. separated by _
        2. ENUM and SET
        3. 避免使用 Null
        4. 少用並拆分 Text/Blob
        5. 盡量不用 foreign key
          1. 由程式限制
        6. primary key
          1. unsigned integer
        7. avoid reserved words
      3. option
        1. row format
        2. collate
          1. utf8_general_ci
          2. utf8_unicode_ci
        3. charset
          1. UTF8
          2. utf8mb4
          3. emoji
      4. index
        1. 盡量不用 foreign key
          1. 由程式限制
        2. uniq_[field name]
        3. idx_[field name]
        4. MySQL will create indexes automatically even for foreign keys.
        5. leftmost prefix
          1. (col1, col2, col3)
          2. col1
          3. col1, col2
          4. col1, col2, col3
      5. data
        1. data types
          1. string
          2. CHAR
          3. fix length
          4. faster query
          5. VARCAHR
          6. indeterminate length
          7. mobile phone
          8. won't be calculated
          9. could be fuzzy search
          10. like '%324%'
          11. TEXT
          12. ENUM
          13. number
          14. tinyint
          15. 1 bit
          16. int
          17. 4 bit
          18. bigint
          19. facebook id
          20. 8 bit
          21. date
          22. DATETIME
          23. '1000-01-01' to '9999-12-31'
          24. save in current timezone
          25. TIMESTAMP
          26. '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC
          27. save UTC in storage, change to connection's timezone for retrieval
          28. YEAR
          29. DATE
          30. IP
          31. Unsigned Integer
          32. INET_ATON
          33. address to number
          34. INET_NTOA
          35. number to address
        2. save raw data
    4. Engine
      1. innodb
        1. row lock
        2. ACID
      2. myISAM
        1. table lock
    5. Sql
      1. orderby 無法使用索引,則使用 filesort
    6. Replication
      1. slave 無法知道是否與 master 同步
        1. solution
          1. Semi-Sync
          2. master 事務提交需要 slave ack
          3. 網路超時後備庫降級為異步復制
          4. 並沒有解決異步復制的根本缺陷
          5. 網路回復後需追趕日誌,漏十秒可能需要十分鐘的追趕
    7. GTID
    8. 多點寫入
      1. garara
        1. 影響效能
    9. 多引擎
    10. Prohibition
      1. store procedure
      2. view
      3. trigger
      4. event
    11. Analysis
      1. show profile;
      2. mysqlsla;
      3. mysqldumpslow;
      4. explain;
      5. show slow log;
      6. show processlist;
      7. show query_response_time(percona);
      8. systemtap
        1. iotime.stp
  5. component
    1. Core component
      1. Security Manager
        1. authentication
      2. File System Manager
        1. Disk I/O
      3. Client Manager
        1. managing client connection
      4. Process Manager
        1. pool of processes/threads
      5. Network Manager
        1. network I/O
      6. Memory Manager
        1. efficient memory manager for handling a large amount of memory
    2. Query Component
      1. Query Parser
        1. check query validation
      2. Query rewriter
        1. pre-optimize a query
      3. Query Optimizer
        1. optimize a query
      4. Query Executer
        1. compile and execute a query
    3. Data Manager
      1. Transaction Manager
        1. handle transactions
      2. Cache Manager
        1. put data in memroy before using
        2. put data in memory before writing them on disk
      3. Access Manager
        1. access data on disk
    4. Tools
      1. Backup Manager
        1. saving and restoring a database
      2. Recovery Manager
        1. restarting the database in a coherent state after a crash
      3. Monitor Manager
        1. logging the activity
        2. monitor tools
      4. Administrator Manager
        1. storing metadata
          1. table schema
        2. tools to manage databases, schemas, tablespaces
  6. Basic
    1. ACID
      1. Atomicity
        1. Transaction 為單位
        2. RDBMS 以 Transaction 為單位進行修復
        3. 保證系統數據從一個正確狀態(consistence state) 到下一個正確狀態
      2. Consistency
        1. unique constraint
        2. foreign key
        3. 保證系統在移動失敗時,能安全地回到本來的正確狀態
      3. Isolation
        1. 同一筆資料,不會同時被兩個 Transaction 改動
        2. 避免 race condition
        3. deadlock detect
      4. Durability
        1. 一旦 commited 資料改動,除非存儲空間受損,否則永不流失
    2. C10K Problem
    3. Stored Procedure
      1. sp
    4. REDO Log
    5. 更改 Schema
      1. 使用 CTAS 取代 alter table
        1. Crate Table As Select ...
        2. 小心 foreign key, table privilege, table setting
        3. 使用 Trigger 同步新建的 table
    6. Backup
      1. disaster recovery
    7. Replication
      1. Availability
  7. Scale Up
    1. partition
      1. Horizontal
        1. criteria
          1. range
          2. list
          3. composite
          4. round-robin
          5. hash
        2. shard
          1. save partition tables in separate instances
      2. Vertical
        1. split by columns
  8. Reference
    1. http://coding-geek.com/how-databases-work/
    2. http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf