The Data Vault concept with example

Back to overview

The Data Vault method and its most important elements

In this article, the Data Vault method is first introduced in principle and the modeling approach is explained using a case study.

A Data Vault model consists of the following three components:

Data Vault element Description
Hubs contain the business keys or natural keys such as invoice number, customer number or employee number. However, no descriptive attributes are included.
Links are used to map relationships between hubs and therefore only contain foreign key attributes on hubs.
Satellites contain descriptive attributes such as customer information or product descriptions and are assigned to hubs or links.

Simple scenario for the use of a data vault

The following case study serves to illustrate the basic aspects of modeling. The fictitious "Diamond Agency" is a large event agency that sells unusual products and events for leisure activities, allowing its customers to experience "special moments". The product portfolio ranges from skydiving and race car driving to candlelight dinners. The company generated a turnover of €125 million last year and employs 250 people.

The basic system landscape

The agency operates a classic Enterprise Resource Management (ERP) solution to handle operational processes. Building on this, a Supplier Relationship Management (SRM) solution is used for supplier relationship management. Customer processes are supported by a Customer Relationship Management (CRM) system. Events are presented to customers via a web store, which also handles the ordering process in the front end.

System landscape of the Diamon agency

All data is integrated into the data warehouse. The analytical applications provide classic standard reporting. In addition, various data mining applications are used. For example, purchasing behavior is analyzed using shopping basket analyses and association rules are extracted from the database. The product portfolio and presentation in the store are optimized based on these analyses.

Overview of the modeling scenario

As shown above, the Diamond agency's data warehouse is based on the Data Vault architecture approach. The case study picks up on an aspect of customer management and illustrates the Data Vault modeling methodology and shows how changes are incorporated into the model. The following three iterations are therefore run through to illustrate this:

  1. The basic model is created in the first iteration.
  2. Due to a change in one of the source systems, the Data Vault model must be adapted in the second iteration.
  3. In the third iteration, the Data Vault model is supplemented due to a specification gap.

Starting from the basic model, the example thus addresses the main triggers for adjustments and shows how the changes are incorporated into the Data Vault model.

Iteration 1

The first step is to record the customers, as the most important stakeholders of the Diamond agency, in the data model:

  • Customers have attributes such as first name and surname and are clearly assigned to a region. The customers and the corresponding regional assignments are maintained centrally in the ERP system.
  • Unfortunately, the assignments of customers to account managers are not currently stored in the operational systems. However, this connection has been requested in the evaluations for some time, as there is no transparency about the customer base among the account managers.
  • A change request to provide the "Assign customer to account manager" functionality in the CRM system was commissioned.

The scenario described above is mapped in the Data Vault model in iteration 1 as follows:

  • The customer and region business objects are modeled as hubs. Only the artificial keys ("hub ID"), the company-wide unique business keys ("customer number" and "region number"), the load time stamps and the supplying source systems are stored in the hubs.
  • The satellite tables contain the descriptive attributes of the source systems and map the complete history in the data warehouse. The foreign key for the hub ("Customer ID" or "Region ID"), the load time stamp and the data source are stored in the satellite tables.
  • The relationship between customer and region is modeled as a customer/region link. In addition to the artificial key "Link ID", the table contains the foreign keys ("Customer ID" and "Region ID") of the hub tables Hub Customer and Hub Region.

The following conceptual model results for the first iteration:

Data Vault after the first iteration step

Iteration 2

  • The CRM department has done a great job: The change request to provide the functionality "Assign customer to account manager" has been implemented, tested and is already in the CRM production system!
  • The data warehouse model can be extended to include the relationship between customer and customer advisor.

In iteration 2, the relationship between customer and customer advisor is supplemented as follows:

  • The customer advisor business object is added as a hub.
  • The Satellite table Account manager details contains the descriptive attributes of the account manager.
  • The additional link table "Link customer/customer advisor" shows the relationship between the customer and the assigned customer advisor.

Adding the entities described above results in the following model for iteration 2:

Data Vault after the second iteration step

Iteration 3

Users have noticed that the federal state with the corresponding state capital cannot be found in the data warehouse model. These attributes of the region must be added to the data warehouse model. In iteration 3, the missing attributes are added to the model as follows:

→ The federal state business object is added as a satellite and attached to the Region hub.

After the third iteration, the model looks as follows:

Data Vault after the third iteration step

Summary

Using the Diamond agency as an example, it was shown that the application of the Data Vault modeling approach results in a high degree of flexibility: As a rule, adjustments to the requirements, such as the inclusion of additional attributes, can be made without structural adjustments to existing tables. New attributes can be implemented via additional satellites. With classic 3NF models, such changes usually require structural adjustments to tables and the migration of existing data. The additive extension shown has a minimal impact on the overall system. This is also accompanied by a minimal regression test after the additive adjustment.

Data management processes are standardized due to the use of fewer modeling patterns. ETL processes can be changed configuratively or generated (in-factory generator). gene²). The iterative approach shown is supported by agile methods in the development process: Agility is an integral part of the Data Vault method!

Do you have any questions? We have the answers!

Please write to us. We look forward to hearing from you!