1. This work is a mind map on data warehouse testing, based on my understanding on the subject and within a particular context. The purpose of this work is to allow teams to consider different aspects around data warehouse testing.
    1. This work by Chris Saunders is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
      1. Version 2013.01.08
    2. Usual disclaimer applies; that this work may not be accurate, complete or up-to-date or for the contents of external links. Currently there is no plan is to update this work.
  2. References
    1. Data Warehouse & Dimensional Modeling
      1. Online HTML - SQL Server - Data Warehousing (Microsoft)
      2. Online HTML - Creating and Using Data Warehouses - SQL 2000 (Microsoft)
      3. Online HTML - Dimensional Databases (IBM)
      4. PDF (670 pages) - Dimensional Modeling: In a Business Intelligence Environment (IBM)
    2. Data Warehouse Testing (Interesting reading, keeping in mind "context" can vary between projects)
      1. test2008 - Agility in Testing - India
        1. PDF (6 pages) Data Warehouse / ETL Testing: Best Practices. Anindya Mookerjea & Prasanth Malisetty
      2. Developer IQ
        1. Online - Data Warehouse Testing. Manoj Philip Mathen
          1. PDF - Data Warehouse Testing . Manoj Philip Mathen - infosys.com
      3. Universal Young Minds - Data Warehousing Technologies - Testing (last section)
        1. PDF - Data Warehouse Testing Approach and Test Process Guidelines
  3. Terminology
    1. Data Warehouse or Enterprise Data Warehouse
      1. is a central repository of data which is created by integrating data from multiple disparate sources.
      2. dimensional approach
        1. supporters referred as “Kimballites”. Ralph Kimball’s approach is the data warehouse should be modeled using a Dimensional Model/star schema
        2. multi-dimensional (also called star schema, star-join schema or data cube)
          1. The multi-dimensional consists of one or more fact tables referencing any number of dimension tables.
          2. Fact table
          3. consists of the measurements, metrics or facts of a business process.
          4. each record in the fact table is unique
          5. Dimension
          6. provides structured labeling information to otherwise unordered numeric measures.
          7. primary functions of dimensions: to provide filtering, grouping and labeling
      3. normalized approach (also known as 3NF model)
        1. supporters referred as “Inmonites”. Bill Inmon's approach; the data warehouse should be modeled using an E-R model/normalized model.
    2. SQL Server Analysis Services, SSAS
      1. is an Online Analytical Processing from Microsoft
      2. Query Languages
        1. Data definition language (DDL)
          1. is XML based and supports commands such as create, alter, delete, and process.
        2. Data manipulation language (DML)
          1. is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database
          2. Multidimensional Expressions (MDX)
          3. Language Integrated Query (LINQ, pronounced "link")
          4. Structured Query Language (SQL)
          5. Data Mining Extensions (DMX)
    3. SQL Server Reporting Services, SSRS
      1. is a server-based report generation software system from Microsoft.
      2. available up to Visual Studio 2010 only
      3. support up to SQL Server 2008 R2
        1. Need to understand how newer versions relate to this; is it backward compatible can it be developed in VS2010 & VS2012
    4. SQL Server Integration Services, SSIS
      1. is a platform for building enterprise-level data integration and data transformations solutions.
      2. creates or maintains SSIS packages
    5. Eport/Transform/Load, ETL
      1. Extract
        1. extracting the data from the source systems into normally a Store Database
      2. Transform
        1. applies to a series of rules or functions to the extracted data into normally a Load Database
      3. Load
        1. load the data into into normally the Data Warehouse
  4. What to test
    1. Verify schedule job sequence
    2. Data warehouse update
      1. Initial load or refresh
      2. Delta update
    3. ETL
      1. Data quality (completeness of data)
        1. Missing data
          1. Record counts
          2. Checksum
          3. Accurate
          4. NULL
          5. blanks
          6. data truncation due to field lengths
          7. etc
          8. Analyse any rejected data
          9. eg, data miss match fields
        2. Data duplication
          1. Record counts
          2. Checksum
          3. Accurate
        3. Transforms
          1. Data types
          2. Numeric
          3. Accurate length & decimal points
          4. Calculations
          5. Accurate length & decimal points
          6. Strings
          7. Verify TRIM, left and right
          8. The description for each transform defined
          9. Different Complexity
          10. Data copy
          11. Simple transformation
          12. Complex transformation
          13. Note: could compare with EXTRACT function in excel?
          14. Business Requirements to Transform Logic
          15. Sample data from live
          16. Full live data
          17. automated data profiling and data feed
        4. Constraints
          1. Data types across fields
        5. Data obfuscation
          1. Making sensitive data accessible in data warehouse
          2. Understand security levels
          3. Consider test generated data if security risk of production data
      2. Structure
        1. Databases
          1. Source DBs
          2. Store DB
          3. Load DB
          4. Data Warehouse DB
        2. Tables
          1. Facts
          2. Dimensions
      3. Areas to test
        1. Business Functionality
        2. Data Quality
        3. Performance (also see Schedule)
          1. Different volumes to predict future performance
          2. performance profile may not be linear
          3. Load testing
          4. Scalability
          5. Multiple users
        4. Parameters
        5. Settings
        6. Regression Testing
          1. Automation should be considered
          2. Known source data sets compared to expected data sets
          3. Multiple regression test suites
    4. Verify rejected records
    5. Schedule
      1. Performance
        1. Duration of Full refresh jobs
        2. Duration of Delta update jobs
      2. Job failure
        1. Auto-restarting of jobs
        2. Error logging
      3. Verify email notifications
    6. Deployment
      1. Verify environment installation / upgrade
    7. Data Warehouse Reports
      1. Not complete
  5. How to test
    1. Test data
      1. Full data set
      2. Sub-set of full data set
      3. Test data - manipulation of data to cover test conditions
    2. Not complete