-
This work is a mind map on data warehouse testing,
based on my understanding on the subject and within a particular context.
The purpose of this work is to allow teams to consider
different aspects around data warehouse testing.
-
This work by Chris Saunders is licensed under a Creative
Commons Attribution-ShareAlike 3.0 Unported License.
- Version 2013.01.08
- Usual disclaimer applies; that this work may not be accurate,
complete or up-to-date or for the contents of external links.
Currently there is no plan is to update this work.
-
References
-
Data Warehouse & Dimensional Modeling
- Online HTML - SQL Server - Data Warehousing (Microsoft)
- Online HTML - Creating and Using Data Warehouses - SQL 2000 (Microsoft)
- Online HTML - Dimensional Databases (IBM)
- PDF (670 pages) - Dimensional Modeling: In a Business Intelligence Environment (IBM)
-
Data Warehouse Testing (Interesting reading, keeping
in mind "context" can vary between projects)
-
test2008 - Agility in Testing - India
- PDF (6 pages) Data Warehouse / ETL
Testing: Best Practices. Anindya Mookerjea
& Prasanth Malisetty
-
Developer IQ
-
Online - Data Warehouse
Testing. Manoj Philip Mathen
- PDF - Data Warehouse Testing
. Manoj Philip Mathen - infosys.com
-
Universal Young Minds - Data Warehousing
Technologies - Testing (last section)
- PDF - Data Warehouse Testing Approach
and Test Process Guidelines
-
Terminology
-
Data Warehouse or
Enterprise Data
Warehouse
- is a central repository of data which is created
by integrating data from multiple disparate sources.
-
dimensional
approach
- supporters referred as “Kimballites”.
Ralph Kimball’s approach is the data
warehouse should be modeled using a
Dimensional Model/star schema
-
multi-dimensional
(also called star
schema, star-join
schema or data
cube)
- The multi-dimensional
consists of one or more
fact tables referencing
any number of
dimension tables.
- Fact table
- consists of the measurements,
metrics or facts of a business
process.
- each record in the
fact table is unique
- Dimension
- provides structured labeling
information to otherwise
unordered numeric measures.
- primary functions of
dimensions: to provide
filtering, grouping and
labeling
-
normalized approach
(also known as 3NF model)
- supporters referred as “Inmonites”.
Bill Inmon's approach; the data warehouse
should be modeled using an
E-R model/normalized model.
-
SQL Server Analysis
Services, SSAS
- is an Online Analytical Processing from Microsoft
-
Query Languages
-
Data definition language (DDL)
- is XML based and supports commands such
as create, alter, delete, and process.
-
Data manipulation language (DML)
- is a family of syntax elements similar to a computer
programming language used for inserting, deleting
and updating data in a database
- Multidimensional Expressions (MDX)
- Language Integrated Query (LINQ, pronounced "link")
- Structured Query Language (SQL)
- Data Mining Extensions (DMX)
-
SQL Server Reporting Services, SSRS
- is a server-based report generation software system from Microsoft.
- available up to Visual Studio 2010 only
-
support up to SQL Server 2008 R2
- Need to understand how newer versions relate to this;
is it backward compatible can it be developed in VS2010 & VS2012
-
SQL Server Integration Services, SSIS
- is a platform for building enterprise-level data integration
and data transformations solutions.
- creates or maintains SSIS packages
-
Eport/Transform/Load, ETL
-
Extract
- extracting the data from the source systems into normally a Store Database
-
Transform
- applies to a series of rules or functions to the extracted data into normally a Load Database
-
Load
- load the data into into normally the Data Warehouse
-
What
to
test
- Verify schedule job sequence
-
Data warehouse update
- Initial load or refresh
- Delta update
-
ETL
-
Data quality
(completeness of data)
-
Missing data
- Record counts
- Checksum
- Accurate
- NULL
- blanks
- data truncation due to field lengths
- etc
- Analyse any rejected data
- eg, data miss match fields
-
Data duplication
- Record counts
- Checksum
- Accurate
-
Transforms
- Data types
- Numeric
- Accurate length & decimal points
- Calculations
- Accurate length & decimal points
- Strings
- Verify TRIM, left and right
- The description for each transform defined
- Different Complexity
- Data copy
- Simple transformation
- Complex transformation
- Note: could compare with EXTRACT function in excel?
- Business Requirements to Transform Logic
- Sample data from live
- Full live data
- automated data profiling and data feed
-
Constraints
- Data types across fields
-
Data obfuscation
- Making sensitive data accessible in data warehouse
- Understand security levels
- Consider test generated data if security risk of production data
-
Structure
-
Databases
- Source DBs
- Store DB
- Load DB
- Data Warehouse DB
-
Tables
- Facts
- Dimensions
-
Areas to test
- Business Functionality
- Data Quality
-
Performance (also see Schedule)
- Different volumes to predict future performance
- performance profile may not be linear
- Load testing
- Scalability
- Multiple users
- Parameters
- Settings
-
Regression Testing
- Automation should be considered
- Known source data sets compared
to expected data sets
- Multiple regression test suites
- Verify rejected records
-
Schedule
-
Performance
- Duration of Full refresh jobs
- Duration of Delta update jobs
-
Job failure
- Auto-restarting of jobs
- Error logging
- Verify email notifications
-
Deployment
- Verify environment installation / upgrade
-
Data Warehouse Reports
- Not complete
-
How
to
test
-
Test data
- Full data set
- Sub-set of full data set
- Test data - manipulation of data to cover test conditions
- Not complete