The Data Vault method and its main elements
In this article, the Data Vault method is first introduced in principle and the modeling approach is explained using a case study.
A Data Vault model consists of the following three components:
||Hubs contain the business keys or the natural keys such as invoice number, customer number or employee number. However, no descriptive attributes are included.
||Links are used to map relationships between hubs and thus contain only foreign key attributes on hubs.
||Satellites contain descriptive attributes such as customer information or product descriptions and are assigned to hubs or links.
Simple scenario for the use of a Data Vault
The following case study serves to illustrate the basic aspects of modeling. The fictitious “Diamond Agency” is a large event agency that sells unusual products and events for leisure activities and thus allows its customers to experience “special moments”. The product portfolio ranges from skydiving and race car driving to candlelight dinners. Last year, the company generated sales of €125 million and employs 250 people.
The basic system landscape
The agency operates a classic Enterprise Resource Management (ERP) solution to handle operational processes. Building on this, a supplier relationship management (SRM) solution is used for supplier relationship management. Customer processes are supported by a customer relationship management (CRM) system. The events are presented to the customers via a web store, which also handles the ordering process in the front end.
System landscape of the Diamon agency
All data is integrated into the data warehouse. The analytical applications provide classic standard reporting. In addition, various data mining applications are used. For example, shopping behavior is analyzed using shopping cart analyses and association rules are extracted from the data stock. The product portfolio and the presentation in the store are optimized based on these analyses.
Overview of the modeling scenario
As shown above, Diamond Agency’s data warehouse is based on the Data Vault architectural approach. The case study takes an aspect of customer management and illustrates the Data Vault modeling methodology and how changes are incorporated into the model. Therefore, the following three iterations are run through to illustrate this:
- The basic model is created in the first iteration.
- Due to a change in one of the source systems, the Data Vault model must be adjusted in the second iteration.
- In the third iteration, the Data Vault model is supplemented due to a specification gap.
Starting from the basic model, the example thus addresses the main triggers for adjustments and shows how the changes are incorporated in the Data Vault model.
In the first step, the customers, as the most important actors of the Diamond agency, should be captured in the data model:
- Customers have attributes such as first name and last name and are clearly assigned to a region. The customers and the corresponding regional assignments are maintained centrally in the ERP system.
- Unfortunately, the assignments of customers to account managers are not currently stored in the operational systems. However, this connection has been requested in the evaluations for some time now, as there is no transparency regarding the customer base among the account managers.
- A change request to provide the “Assign customer to account manager” functionality in the CRM system was commissioned.
The scenario described above is represented in the Data Vault model in Iteration 1 as follows:
- The Customer and Region business objects are modeled as hubs. Only the artificial keys (“Hub ID”), the company-wide unique business keys (“Customer number” and “Region number”), the loading time stamps and the supplying source systems are stored in the hubs.
- The satellite tables include the descriptive attributes of the source systems and map the complete history in the data warehouse. The foreign key to the hub (“Customer ID” or “Region ID”), the load time stamp and the data source are stored in the satellite tables.
- The relation between customer and region is modeled as link customer/region. In addition to the artificial key “Link ID”, the table contains the foreign keys (“Customer ID” and “Region ID”) of the hub tables Hub Customer and Hub Region.
For the first iteration, the following conceptual model emerges:
Data Vault after the first iteration step
- The CRM department has done quite a job: The change request to provide the functionality “Assign customer to account manager” has been implemented, tested and is already in the CRM production system!
- The data warehouse model can be extended to include the relationship between customer and account manager.
In iteration 2, the relationship between customer and account manager is completed as follows:
- The Customer Service Representative business object is added as a hub.
- The Account Manager Details satellite table contains the descriptive attributes of the account manager.
- The additional link table “Link customer/account manager” maps the relation between the customer and the assigned account manager.
Adding the described entities results in the following model for iteration 2:
Data Vault after the second iteration step
The users have noticed that in the data warehouse model the federal state with the corresponding state capital cannot be found. These attributes of the region must be added to the data warehouse model. In iteration 3, the missing attributes are added to the model as follows:
→ The business object State is added as a satellite and attached to the hub Region.
After the third iteration, the model looks like this:
Using the example of the Diamond agency, it was shown that a high degree of flexibility results from the application of the Data Vault modeling approach: As a rule, adjustments in the requirements, such as the inclusion of additional attributes, can be made without structural adjustments to existing tables. New attributes can be implemented via additional satellites. In classical 3NF models, such changes usually require structural adjustments to tables and migration of existing data. The shown additive extension has a minimal impact on the overall system. This is also accompanied by a minimal regression test after the additive adjustment has been made.
Due to the use of few modeling patterns, data management processes are standardized. ETL processes can be changed or generated (in-factory Generator Gen²) configuratively. The iterative approach shown is supported by agile methods in the development process: Agility is an integral part of the Data Vault method!