Late Arriving Dimensions- Design Approach
June 16, 2012 Leave a comment
This post will describe how to handle Late Arriving Dimensions, for a Data Mart, and how we did it in our project
What we did it was taken via the design approach defined on the GURU webiste for all KIMBALL here is the link i will try to theoretical y define the approach stated and what we did to suit our needs
http://www.kimballgroup.com/html/designtipsPDF/DesignTips2006/KU78LateArrivingDimensionRows.pdf
What is a Late Arriving Dimensions(LAD)
LAD are the dimensions, where in FACT Table records are populated priot to population of the dimension records , this statement in itself is very confusing especially for the people in the OLTP world, but hey in OLAP world this is something we have to deal with will site and example from the real life project and try to convince you
In the INSURANCE industry Site a policy lifecycle, a policy is been issued to a person and a and he has the POLICYNUMBER with him, we have a 15 day period wherein the Policy document will be delivered to the client and his approval sought, the entry for this would be made/ confirmed only when the policy document is delivered/ accepted by the client so in these 15 days if any claim is made on the POLICY NUMBER then it creates a ORPHAN record in terms of Dimension not having the entry for that record.
So the scenario is as below with the table structures
TableDim1: (KeyCol, BusinessKey, Description)
Keycol: Is the Surrogate Key Column on the dimension column, this is a TYPE1 SCD, popualted on a daily basis.
BusinessKey: This is the KeyCol as in Upstream OLTP System.( This is the Key value which has been generated on the OLTP)
TableFact1: (KeyColFact, KeyColDim1, KeyColDim2,FactValue)
KeyColDim1: Is the key column as coming from the Dimension(Primary Key–> Foreign key relation)
KeyColDim2: This is the key column same as above for a different dimension
FactVaue: This is the fact the fact value of the measure.
Now to define the design approach what we did:
The TableDim1, would have a default row something values like:
0, 0,UNKNOWN.
On the data Load ETL, now when the FACT value comes in, and a lookup happens on the base of BUSINESS KEY with the Dimension table, the value is returned as NULL as the tableDIM1 hasn’t obtained the BusinessKey1 Record in it, we do a ISNULL on the fact row for the column replace it with a default 0 so the fact record looks like:
(0,4,2111.07), Note the first column is kEYDIM1 column which is looked up and we have got the default value there.
So far so good we have handled the FACT row well, and our data load/ cube processing wouldn’t fail as we have the constraints met properly (PrimayKey–> Foreign Key), bit the question is as and when the Dimension value arrives for that specific record how do the OLAP system know that it has arrived? to solve this again we took the help of GURU, and this is what we did,
a) Introduce a new table which capture all the misses for the BUSINESSKEY, and have a row each for them. This table would have the BUSINESSKEY captured, and FACTROW Key column also captured into it(KeyColFact).
b) Once the ETL load has finished traverse through all the fact rows, check for the BUSINESS Keys again doing a Lookup against the Dimension table( as the Dimension table record would have arrived now), obtain the new KeyDim1, and update the fact table on the basis of KeyColFact.
This was the easiest and safest way around, and can be easily enhanced to have a single table for all the lateArrivingdimensions by adding a new column to refer to the column names.