Site icon Acacia Tech Group

Slowly Changing Dimension

Imagine a data architect designing the structure of a data warehouse. The architect has to write down all potential dimensions, static and changing dimensions. Also, she has to consider how often changing dimensions undergo alternations. Furthermore, would keeping a history of these varying values be necessary? And how quickly current and historical data should be retrieved? In other words, she has to plan for static and slowly changing dimensions (SCDs).

An SCD is a table dimension that evolves. There are many types of SCDs in a Data Warehouse solution. Below we describe some of them.

Slowly Changing Dimension Type 0: Keep Original

A dimension classified as type 0, such as Full Name, Date of Birth, Social Security Number, etc., is characterized by its original values remaining unchanged.

Slowly Changing Dimension Type 1: Overwrite

A dimension categorized as type 1, such as Customer Email Address, involves the updating of its values where the previous ones are overwritten. The update removes the previous value. When storage efficiency is a primary concern, the overwrite dimension becomes essential.

For example, a customer for a retail company decides to update their email address, due to a recent job change, resulting in a new email domain.

Table Record

Customer IDCustomer NameCustomer Email
1Peterpeter@technology-hub.com

New Table Record

Customer IDCustomer NameCustomer Email
1Peterpeter@strucutreddata.com

The Customer ID and Customer Name are static dimensions that will always remain unchanged.

Slowly Changing Dimension Type 2: Add New Row

A new row is appended each time a dimension value changes. Consequently, this helps to maintain a historical record of all values. Moreover, the “Is Active” dimension designates the currently active data record. The “Is Active” dimension is one option. Another option is the “Version” dimension, where the maximum version designates the most recent data record update. While data ranges, an alternative would indicate the period during which the data record period is active.

An example featuring the “Is Active” dimension:

Row IDCustomer IDCustomer NameCustomer EmailIs Active
11Peterpeter@technology-hub.comNo
21Peterpeter@structureddata.comYes

Another example employing the “Version” dimension:

Row IDCustomer IDCustomer NameCustomer EmailVersion
11Peterpeter@technology-hub.com1
21Peterpeter@structureddata.com2

Lastly, an example to demonstrate the data ranges option:

Surrogate IDCustomer IDCustomer NameCustomer EmailStart DateEnd Date
11Peterpeter@technology-hub.com01-Feb-202301-Jan-2024
21Peterpeter@structureddata.com01-Jan-2024NULL

As one sees, the currently active record has the End Data value set to NULL.

Slowly Changing Dimension Type 3: Add New Dimension

A new dimension is introduced to the table, representing an existing dimension. Consequently, the newly added dimension would hold the updated values of the dimension it mirrors.

This type is suitable when one or two members of the table undergo regular updates. However, it is not preferable if more than two members require changing. In such situations, type 2 would be the better choice.

Original Table:

Customer IDCustomer NameCustomer Email
1Peterpeter@technology-hub.com

Table with a new dimension:

Customer IDCustomer NameCurrent Customer EmailOriginal Customer Email
1Peterpeter@structureddata.competer@technology-hub.com

As can be observed, the dimension “Customer Email” preserves only one historical value.

Slowly Changing Dimension Type 4: Add History Table

In this type, two tables divide the data records: an active table, that holds current records, and a history table, which stores past records. The active table is used for day-to-day data manipulation operations such as inserts, updates, and deletes. While the history table is used for historical queries. Importantly, once a record is deleted from the active table, it instantly moves to the history table. Subsequently, the history table is a repository of complete historical records.

Active Table

Row IDCustomer IDCustomer NameCustomer EmailStart DateEnd Date
31Peterpeter@oracle.com01-June-2023NULL

History Table

Row IDCustomer IDCustomer NameCustomer EmailStart DateEnd Date
11Peterpeter@technology-hub.com01-Feb-202201-Jan-2023
21Peterpeter@structuredata.com01-Jan-202301-June-2023

The history table holds all records of the customer named “Peter”. Peter actively moved between three companies.

Slowly Changing Dimensions Type 5: Types 4 and 1 Combined

Type 5 merges type 4 and type 1, utilizing a base dimension table and a mini-dimension table. The base dimension table has a dimension with reference key values to the mini-dimension table. Moreover, type 1 is used to alter a referenced value.

Customer Table

Customer IDCustomer NameCustomer EmailEmployer
1Peterpeter@oracle.com3
2Marymary@oracle.com3

Company Table

Company IDCompany Name
1Technology Hub
2Structured Data
3Oracle

To overwrite an “Employer” value, we would have to reference another key from the company table.

Now what happens with the history of updating the reference keys? Each time the “Employer” value changes, a fresh record is added to a history table:

Customer IDCompany IDStart DateEnd Date
1101-Feb-202201-Jan-2023
1201-Jan-202301-June-2023
1301-June-2023NULL

As seen above, we use type 4 to trace full employment history, via the history table.

Slowly Changing Dimensions Type 6: Types 1, 2 and 3 Combined

It is possible to combine types 1, 2, and 3. Again, to clarify, Type 1 involves overwriting a value, while type 2 entails adding a new row. And finally, type 3 involves a new attribute to the table. Do you have any ideas on how these could be combined?

Recalling one of the drawbacks of type 1, it cannot track any previous values. For type 2, one has to navigate the start and end date columns to trace the preceding value. Lastly, as for type 3, it can only track the previous value. So, what happens when we combine these types? Do we overcome the drawbacks?

Example of Combined Types:

Row IDCustomer IDCustomer NameCurrent Customer EmailOriginal Customer EmailStart DateEnd DateIs Active
11Peterpeter@technology-hub.comNULL
01-Feb-2022
01-Jan-2023No
21Peterpeter@structuredata.competer@technology-hub.com01-Jan-202301-June-2023No
31Peterpeter@oracle.competer@structuredata.com01-June-2023NullNo

Combining all three types would result in the following benefits:

The article discusses the concept of slowly changing dimension (SCD) within Data warehousing. Focusing on the five SCD types. Type 0 keeps original values unchanged. Additionally, type 1 overwrites current values. While Type 2 adds new rows to the table to keep history. And Type 3 adds new dimensions which mirror existing ones. Furthermore, type 4 tracks changes via a history table. Lastly, Types 5 and 6 combine other types.

References

[1] Slowly Changing Dimensions (SCD). Slowly Changing Dimensions (SCD) is a… | by Arun Kumar Pandey (Ph.D.) | Medium

[2] Database-datapipeline-ETL/Database/7.Slowly-Changing-dimension.ipynb at main · arunp77/Database-datapipeline-ETL (github.com)

[3] Populate slowly changing dimensions in Azure Synapse Analytics pipelines – Training | Microsoft Learn

[4] So you want Slowly Changing Dimension? | DoltHub Blog

Exit mobile version