wissen.leben | WWU Münster 


Data Warehouse Detective

DWD logo Within the scope of our project Data Warehouse Schema Design we have developed a tool called Data Warehouse Detective (DWD) to support the design of normalized data warehouse schemata. Version 1.0 of this tool has been created in the course of a project seminar involving six students during summer term 2006.

Conceptual DW design with DWD can be sketched as follows. First, DWD imports meta-data (tables, attributes, keys, foreign keys) of selected operational databases via ODBC and enriches these meta-data based on an analysis of database instances to detect missing keys, foreign keys, and functional dependencies (which are often not properly declared in practice). Afterwards, the designer chooses relevant attributes to populate the forthcoming data warehouse, and based on the functional dependencies recognized previously, DWD synthesizes multidimensional conceptual DW schemata for these relevant attributes, including dimension hierarchies. Multidimensional normal forms are used throughout this process to guarantee that all fact schemata under design really fit the underlying data sources and to gain control over optional dimension levels with NULL values (which allows to avoid summarizability problems and inconsistent queries).

DWD is useful in two directions: On the one hand, it allows to design new schemata based on analyzed data sources. On the other, it supports benchmarking of existing schemata against multidimensional normal forms. The following screen shot illustrates benchmarking performed by DWD. The lower part of the screen shows a hand-made DW fact schema in the banking domain while the upper part shows a normalized schema synthesized by DWD based on an analysis of the underlying data sources. Both schemata represent account balances of individual accounts per day along with corresponding dimension hierarchies. Benchmarking here leads to two observations (highlighted in red by DWD in the lower schema): First, the hand-made schema is based on the assumption that regions can be assigned uniquely to countries, while the underlying data contradicts this assumption with regions that cross countries (e.g., the Alps). Thus, loading the dimension hierarchies of the lower schema is doomed to fail. Second, the assignment of the age of customers as a measure in an account related fact schema leads to redundancies as the age of a customer will be repeated for every of her accounts. Here, normalization with DWD creates a second fact schema, reporting all customer related measures including age (not shown in the figure).

DWD Benchmarking

A multimedia manual in German, which explains all use cases of DWD, is available upon request (20MB, Windows only).

Please don't hesitate to contact us if you'd like to see DWD in action!


Publications related to this project

  • T. Haselmann, J. Lechtenbörger, G. Vossen: Data Warehouse Detective: Schema Design Made Easy (Proc. 12th BTW, pp. 606-608), 2007 (Abstract) (Download)
  • Philipp Borgschulte: SQL-Basierte Analyse von 1:n-Beziehungen in operationalen Datenbanken, 2006


Impressum | © 2008 WWU Münster
Universität Münster
Schlossplatz 2 · 48149 Münster
Tel.: +49 (251) 83-0 · Fax: +49 (251) 83-3 20 90
E-Mail: