Data warehouse architectures

Back to overview

Data architectures for business intelligence systems

Based on the definition of the data warehouse according to William H. Inmon[1], the hub-and-spoke approach is first presented as the basic architecture in the data warehouse environment. Building on this, the various interpretations of the basic architecture and the relevant concepts of the "Enterprise Data Warehouse"¹, the "Dimensional Data Warehouse"[2] and the "Data Vault Approach"[3] are shown. Finally, the data vault is explained in depth.

Core data warehouse with hub-and-spoke architecture

Probably the best-known definition of a data warehouse was provided by William H. Inmon in 1992 in "Building The Data Warehouse". Inmon defines the data warehouse as follows:

"A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision-making process." ¹

In order to realize a data warehouse database that meets the above-mentioned requirements, various architecture and modeling concepts have been developed. The basic common feature of the concepts considered in this paper is the hub-and-spoke architecture, which is shown in the following figure:

Hub-and-spoke architecture

The core data warehouse serves as a "hub" and fulfills the task of integration, quality assurance and data distribution to the data marts. The data marts are the "spokes" and have an application orientation as well as predefined business enrichment and aggregation.

Overview of the data architectures

The main architectures in the data warehouse environment are based on the hub-and-spoke approach described above:

Essential architectures in the data warehouse environment

The core data warehouse integrates the data and populates the data marts, which act as the basis for data analysis and reporting.

Enterprise Data Warehouse: The aim of Inmon's architecture proposal is to create a company-wide data model for the core data warehouse. The data is organized in the third normal form (3NF) and is available in atomic form. The data marts can be easily constructed from this data pool. Key figure definitions apply across the entire model. The result is a complex model with a large number of tables and relationships. Further development and maintenance of the system are therefore time-consuming.
Dimensional data warehouse: Kimball provides a dimensional data model in star schema modeling for the core data warehouse, i.e. the data is stored in dimension and fact tables. "Confirmed dimensions can be used across use cases with different fact tables, i.e. in different business contexts. Changes to the sources usually lead to complex development activities. The dimensional modeling aspect is also frequently used in other architectural approaches at data mart level.
Data Vault approach: Linstedt developed the basic architecture further and not only addresses data modeling aspects in the Data Vault approach, but also proposes suitable process models. The use of only a few modeling patterns results in a high degree of flexibility with regard to changes. The Data Vault method therefore enables the agility often demanded by specialist departments and simplifies the adaptation of business intelligence solutions due to changing requirements.

Principles of the Data Vault architecture

The basic architecture shown below with Raw Vault and Business Vault follows the hub-and-spoke approach.

Basic architecture with Raw Vault and Business Vault

The Raw Vault records the data of the stage area. This identifies relationships between objects and stores the descriptive attributes in a historicized form. This creates the "single version of facts". After applying business logic (e.g. aggregation and transformation), the Business Vault represents the "single point of truth". The BI layer is accessed via an access layer that provides the data from the Business Vault. Special data requirements can also be served by the access layer through the Raw Vault.

The different layers of the architecture can be further explained as follows:

Staging Area

The main task of the staging area is to make the delivered data available for the loading process in an unchanged yet optimized form. This optimization can manifest itself in a relational structure on the target system in order to avoid or minimize media breaks and network traffic. However, a landing zone at file system level is also possible (criterion: accessibility via ETL server).

The data structures of a relational staging area map the sources. The use of measures to ensure integrity and content transformations is deliberately avoided. Only the speed and functional reliability of the loading process are important. This means that new sources can be connected quickly in line with the agile approach. Existing sources can be managed in parallel and the reorganization of data can be implemented flexibly.

Raw Vault

The content of the data remains unchanged in the Raw Vault. The task of the raw vault is to integrate and historicize the information.

The data is therefore assigned to business objects, which in turn are defined as hubs and satellites or links and satellites. Hubs contain keys to uniquely identify an entity, associated satellites contain the descriptive information and provide the necessary historization. Links connect hubs and have their own satellites to describe the relationship. The information is assigned to the target data types within these structures. References in a relational staging can temporarily enable lineage functionalities.

All tables in the Raw Vault are managed using uniform "insert only" processes (different hash value comparisons depending on the target type) and are structured uniformly. The managed processes are therefore particularly suitable for automation.

Business Vault

The Business Vault is managed after the Raw Vault and is in the same schema. Its task is to map the business rules that represent the business requirements and change the content of the data. KPIs are derived from supplied key figures in the raw vault. Data from different systems is aggregated and consolidated.

Access layer

The access layer, which accesses data from the business vault and possibly the raw vault, may also have a virtual structure. It follows the requirements of the BI infrastructure used, has regular dimensional features and is organized according to use cases.

Individual use cases (experience has shown that this concerns rather atomic reporting, legal reporting or exports) can also directly access the Business Vault or the Raw Vault.

In the blog entry "Data Vault Basics", the architecture of a DP is explained in more detail and illustrated using a use case.

[1] Building the Data Warehouse (1992)[2] The Data Warehouse Toolkit^3rd Edition (2013)[3] Data Vault Series 1 - Data Vault Overview (2002)

Do you have any questions? We have the answers!

Please write to us. We look forward to hearing from you!

Send e-mail