70-463 Exam Implementing DW (Chapter 1+2)

Logical Design
1. Schema
  1. Star
  2. Snowfake
2. Granularity
3. Auditing & Lineage
4. Dimension
  1. Hierarchy
    1. Date
    2. Geography
  2. Discretization
    1. Age
    2. Income
  3. Column types
    1. Keys
    2. Name columns
    3. Attributes
    4. Member properties
    5. Lineage columns
  4. SCD
    1. type 1 - no historical data
    2. type 2 - store historical data by adding rows
    3. type 3 - store historical data by adding columns
5. Fact
  1. column types
    1. Foreign keys
    2. Measures
    3. Lineage columns (optional)
    4. Business key columns from the primary source table (optional)
  2. measure types
    1. Additive
      1. Sale Amount
      2. Product Quantity
    2. Non-Additive
      1. Price
      2. Balance
      3. Age
  3. Many-to-Many Relationship
    1. Create intermediate dimensions
Implementation
1. Database Configuration
  1. Simple log mode
  2. Data Staging
2. Dimensions
  1. Simple sequential integers for surrogate keys
  2. Computed columns
    1. Discretize continuous values
    2. Column values that are constantly changing i.e. Calculate the age based on birthdate
  3. Comfort/Shared dimensions
3. Fact
  1. Inferred member
  2. Remove foreign key constraints in production
  3. Retain the foreign key constraints during the development and testing phases
4. Performance
  1. Indexing
    1. Filtered index for non clustered index
    2. Utilise parallel processing with hash joins
  2. Indexed Views
    1. SCHEMABINDING
    2. Indexing
    3. COUNT_BIG()
  3. Columnstore Indexes
    1. The columnstore index is divided into units called segments.
    2. Limits
      1. Nonclustered only
      2. Only one columnstore per table
      3. If table is partitioned, the columnstore index must be partition aligned
      4. not allowed on indexed views
      5. Can’t be a filtered index
      6. data type limitations
  4. Partitions
    1. Large fact tables
    2. Partitioning over dates is most common
    3. Max 15,000 partitions
    4. Methods
      1. Create in advance
      2. Sliding window scenario
  5. Batch Processing
    1. 1000 rows per batch.
    2. available for
      1. Filter
      2. Project
      3. Scan
      4. Local hash (partial) aggregation
      5. Hash inner join
      6. (Batch) hash table build
  6. Data Compression
    1. Row
      1. OLTP Systems
    2. Page
      1. DW
    3. Unicode
      1. Save 50% of space
5. Auditing
  1. Usage of Data Lineage
    1. User
    2. Login
    3. Create Time
    4. Load Time
    5. Row count