Imagine a data architect designing the structure of a data warehouse. The architect has to write down all potential dimensions, static and changing dimensions. Also, she has to consider how often changing dimensions undergo alternations. Furthermore, would keeping a history of these varying values be necessary? And how quickly current and historical data should be retrieved? In other words, she has to plan for static and slowly changing dimensions (SCDs).
An SCD is a table dimension that evolves. There are many types of SCDs in a Data Warehouse solution. Below we describe some of them.
Slowly Changing Dimension Type 0: Keep Original
A dimension classified as type 0, such as Full Name, Date of Birth, Social Security Number, etc., is characterized by its original values remaining unchanged.
Slowly Changing Dimension Type 1: Overwrite
A dimension categorized as type 1, such as Customer Email Address, involves the updating of its values where the previous ones are overwritten. The update removes the previous value. When storage efficiency is a primary concern, the overwrite dimension becomes essential.
For example, a customer for a retail company decides to update their email address, due to a recent job change, resulting in a new email domain.
Table Record
Customer ID | Customer Name | Customer Email |
1 | Peter | peter@technology-hub.com |
New Table Record
Customer ID | Customer Name | Customer Email |
1 | Peter | peter@strucutreddata.com |
The Customer ID and Customer Name are static dimensions that will always remain unchanged.
Slowly Changing Dimension Type 2: Add New Row
A new row is appended each time a dimension value changes. Consequently, this helps to maintain a historical record of all values. Moreover, the “Is Active” dimension designates the currently active data record. The “Is Active” dimension is one option. Another option is the “Version” dimension, where the maximum version designates the most recent data record update. While data ranges, an alternative would indicate the period during which the data record period is active.
An example featuring the “Is Active” dimension:
Row ID | Customer ID | Customer Name | Customer Email | Is Active |
1 | 1 | Peter | peter@technology-hub.com | No |
2 | 1 | Peter | peter@structureddata.com | Yes |
Another example employing the “Version” dimension:
Row ID | Customer ID | Customer Name | Customer Email | Version |
1 | 1 | Peter | peter@technology-hub.com | 1 |
2 | 1 | Peter | peter@structureddata.com | 2 |
Lastly, an example to demonstrate the data ranges option:
Surrogate ID | Customer ID | Customer Name | Customer Email | Start Date | End Date |
1 | 1 | Peter | peter@technology-hub.com | 01-Feb-2023 | 01-Jan-2024 |
2 | 1 | Peter | peter@structureddata.com | 01-Jan-2024 | NULL |
As one sees, the currently active record has the End Data value set to NULL.
Slowly Changing Dimension Type 3: Add New Dimension
A new dimension is introduced to the table, representing an existing dimension. Consequently, the newly added dimension would hold the updated values of the dimension it mirrors.
This type is suitable when one or two members of the table undergo regular updates. However, it is not preferable if more than two members require changing. In such situations, type 2 would be the better choice.
Original Table:
Customer ID | Customer Name | Customer Email |
1 | Peter | peter@technology-hub.com |
Table with a new dimension:
Customer ID | Customer Name | Current Customer Email | Original Customer Email |
1 | Peter | peter@structureddata.com | peter@technology-hub.com |
As can be observed, the dimension “Customer Email” preserves only one historical value.
Slowly Changing Dimension Type 4: Add History Table
In this type, two tables divide the data records: an active table, that holds current records, and a history table, which stores past records. The active table is used for day-to-day data manipulation operations such as inserts, updates, and deletes. While the history table is used for historical queries. Importantly, once a record is deleted from the active table, it instantly moves to the history table. Subsequently, the history table is a repository of complete historical records.
Active Table
Row ID | Customer ID | Customer Name | Customer Email | Start Date | End Date |
3 | 1 | Peter | peter@oracle.com | 01-June-2023 | NULL |
History Table
Row ID | Customer ID | Customer Name | Customer Email | Start Date | End Date |
1 | 1 | Peter | peter@technology-hub.com | 01-Feb-2022 | 01-Jan-2023 |
2 | 1 | Peter | peter@structuredata.com | 01-Jan-2023 | 01-June-2023 |
The history table holds all records of the customer named “Peter”. Peter actively moved between three companies.
Slowly Changing Dimensions Type 5: Types 4 and 1 Combined
Type 5 merges type 4 and type 1, utilizing a base dimension table and a mini-dimension table. The base dimension table has a dimension with reference key values to the mini-dimension table. Moreover, type 1 is used to alter a referenced value.
Customer Table
Customer ID | Customer Name | Customer Email | Employer |
1 | Peter | peter@oracle.com | 3 |
2 | Mary | mary@oracle.com | 3 |
Company Table
Company ID | Company Name |
1 | Technology Hub |
2 | Structured Data |
3 | Oracle |
To overwrite an “Employer” value, we would have to reference another key from the company table.
Now what happens with the history of updating the reference keys? Each time the “Employer” value changes, a fresh record is added to a history table:
Customer ID | Company ID | Start Date | End Date |
1 | 1 | 01-Feb-2022 | 01-Jan-2023 |
1 | 2 | 01-Jan-2023 | 01-June-2023 |
1 | 3 | 01-June-2023 | NULL |
As seen above, we use type 4 to trace full employment history, via the history table.
Slowly Changing Dimensions Type 6: Types 1, 2 and 3 Combined
It is possible to combine types 1, 2, and 3. Again, to clarify, Type 1 involves overwriting a value, while type 2 entails adding a new row. And finally, type 3 involves a new attribute to the table. Do you have any ideas on how these could be combined?
Recalling one of the drawbacks of type 1, it cannot track any previous values. For type 2, one has to navigate the start and end date columns to trace the preceding value. Lastly, as for type 3, it can only track the previous value. So, what happens when we combine these types? Do we overcome the drawbacks?
Example of Combined Types:
Row ID | Customer ID | Customer Name | Current Customer Email | Original Customer Email | Start Date | End Date | Is Active |
1 | 1 | Peter | peter@technology-hub.com | NULL | 01-Feb-2022 | 01-Jan-2023 | No |
2 | 1 | Peter | peter@structuredata.com | peter@technology-hub.com | 01-Jan-2023 | 01-June-2023 | No |
3 | 1 | Peter | peter@oracle.com | peter@structuredata.com | 01-June-2023 | Null | No |
Combining all three types would result in the following benefits:
- Viewing historical and current data within a single report.
- Resolves the limitations of types 1 and 3, by keeping track of previous values all in the same table.
- Overcomes the limitation of type 2, by adding the preceding value of a changing dimension in each row.
The article discusses the concept of slowly changing dimension (SCD) within Data warehousing. Focusing on the five SCD types. Type 0 keeps original values unchanged. Additionally, type 1 overwrites current values. While Type 2 adds new rows to the table to keep history. And Type 3 adds new dimensions which mirror existing ones. Furthermore, type 4 tracks changes via a history table. Lastly, Types 5 and 6 combine other types.