The data extracted from source systems
A record of all changes in data over all time. May also contain an operational data store.
Business context is applied to the data
Data is changed and enriched by applying business rules
The structure of the data is changed to enable standard reporting tools to easily access the data
Business context is applied to make the data easier to consume
Detailed business view
Good architecture is modular, separating concerns across its components. This helps ensure each component does one job well, which in turn helps the components work together, delivering a valuable experience to users. Modular architecture is also easier to maintain.
A useful data warehouse does not attempt to take data directly from source systems to end-users in one hit, instead it uses modular components we call data layers. While these modular components have many specific names (eg “staging”, “foundation”, “presentation”) what ties them together is their layered nature: each one does a single job well, before passing data on to another layer.
Many of the ideological battles of the past (eg Inmon vs Kimball) were founded on an assumption that one methodology must rule them all. As soon as we think about a data warehouse in terms of layers, we are free to choose the optimal methodology for the job each layer is doing.
Here are the data layers within the Optimal Data Engine, and the methodology we implement in each.
Data from your source systems come in all shapes, sizes and frequencies. These systems often change their structure without your knowledge or permission. This requires subsequent layers to deal flexibly with all these changes, but first let’s create something reliable for those change-tolerant layers to point at.
Time Variant Layer
A data warehouse serves as an organisation’s memory. As source systems will change data their structure frequently and come and go entirely, we need to make a “persistent copy” of that memory for the future. This allows subsequent layers to reconstruct a point-in-time view of your organisation at a later date, to answer the question “what did we know about these products three months ago?” without breaking a sweat.
This layer is not currently within the Optimal Data Engine, as customer environments differ significantly. Generally, we recommend Change Data Capture (CDC) to produce this layer from the source systems, and following [these guidelines] makes it easier to configure your Optimal Data Engine.
This is the first layer created, managed and loaded by the Optimal Data Engine.
This layer implements the concepts of the Data Vault methodology, as popularised by Dan Linstedt and Hans Hultgren. Its goal is to load combinations of hub, satellite and link tables, with each set or ensemble describing a source system table.
We chose this methodology for the job to be done at this layer because this process needs to be automatic, and free from the intervention of business logic.
The raw vault provides a record of fact, it represents to the business owners what the data is.
This layer also provides business context to the data that is contained within it.
In this second layer of the Optimal Data Engine, we employ the Data Vault methodology again, but in this layer we apply business rules to change this data to represent what the business owners want.
This allows us to add important business logic like consolidation of entities (eg master data management), application of data quality rules and the enrichment of the data.
These business rules are only ever applied in the Business Vault Layer, isolating and governing these business rules.
Once business rules have been applied in the Business Vault layer, data is almost ready for presentation to end-users, but needs a few more transformations. The Reporting Layer changes the structure of the data so it more closely resembles the business’ understanding of the events, people and things that make it up.
In this layer we restructure the data to present it as standard star schema based dimensional models for self service reporting (based on the Kimball Group dimensional methodology), denormalised summary tables for dashboards or denormalised detailed tables for analytical modelling.
Not all parts of the business view the customer the same way, not every unit calls products by the same names. Instead of fighting endlessly, or writing tough-to-maintain bespoke code, this layer presents data from the Reporting Layer exactly as the business would like to see it.
Typically this layer is managed by the your Business Intelligence tool
Detailed technical view[data layers picture ]
Time Variant Layer
The business vault design is reverse engineered from the desired Reporting layer. Each dimension or fact table results in a business vault ensemble. Each attribute in the reporting layer is stored in the business vault.
The reporting layer is a star schema model of dimensions and facts. It is merely a new layout of the business vault.
For most dimensions and fact there is a one-to-one relationship between the business vault ensemble and the reporting layer table. In the case of evolving events, one business vault ensemble results in a dimension, an accumulating snapshot fact, and a movement fact if they are all desired. Ordinarily, bridge tables are generated directly from link tables in the business vault. In unusual cases the bridge table may have its own ensemble.
Time Variant Layer
- Stores one record for every change over all time
- Is persistent, records are never deleted
- Must be able to be reconciled to the Source Data layer
- Change in the structure of the data is allowed
- Changing of data values is not allowed.
- Must be able to be reconciled to the Time Variant layer