-
Logical
Design
-
Schema
- Star
- Snowfake
- Granularity
- Auditing & Lineage
-
Dimension
-
Hierarchy
- Date
- Geography
-
Discretization
- Age
- Income
-
Column types
- Keys
- Name columns
- Attributes
- Member properties
- Lineage columns
-
SCD
- type 1 - no historical data
- type 2 - store historical data by adding rows
- type 3 - store historical data by adding columns
-
Fact
-
column types
- Foreign keys
- Measures
- Lineage columns (optional)
- Business key columns from
the primary source
table (optional)
-
measure types
-
Additive
- Sale Amount
- Product Quantity
-
Non-Additive
- Price
- Balance
- Age
-
Many-to-Many
Relationship
- Create intermediate dimensions
-
Implementation
-
Database
Configuration
- Simple log mode
- Data Staging
-
Dimensions
- Simple sequential
integers for
surrogate keys
-
Computed
columns
- Discretize continuous values
- Column values that
are constantly changing
i.e. Calculate the age
based on birthdate
- Comfort/Shared dimensions
-
Fact
- Inferred member
- Remove foreign key constraints
in production
- Retain the foreign key constraints
during the development and
testing phases
-
Performance
-
Indexing
- Filtered index for non clustered index
- Utilise parallel processing with hash joins
-
Indexed Views
- SCHEMABINDING
- Indexing
- COUNT_BIG()
-
Columnstore Indexes
- The columnstore index is
divided into units called segments.
-
Limits
- Nonclustered only
- Only one columnstore per table
- If table is partitioned,
the columnstore index
must be partition aligned
- not allowed on indexed views
- Can’t be a filtered index
- data type limitations
-
Partitions
- Large fact tables
- Partitioning over dates
is most common
- Max 15,000 partitions
-
Methods
- Create in advance
- Sliding window scenario
-
Batch Processing
- 1000 rows per batch.
-
available for
- Filter
- Project
- Scan
- Local hash (partial) aggregation
- Hash inner join
- (Batch) hash table build
-
Data Compression
-
Row
- OLTP Systems
-
Page
- DW
-
Unicode
- Save 50% of space
-
Auditing
-
Usage of
Data Lineage
- User
- Login
- Create Time
- Load Time
- Row count