Friday 5 June 2015

Designing Data Cubes - Part 4

This is part of an on-going series on data cube design.  Click here for the previous article.

Use Entity-Relationship Diagrams to Guide the Dimension Design

In the previous post I discussed the need for eliminating unnecessary dimensions during the cube design phase. Once you have all the required dimensions, you need to arrange the components into hierarchies. In some cases, it's not always obvious if dimension elements belong together in the same dimension or in separate dimensions.

In an earlier post, I discussed the basics of ER Diagrams so I won't repeat that here. I have found taking the time to model the dimension elements properly makes the dimension design easier and helps avoid common design errors.

Situation 1:  Separate Dimensions

When two elements have a many-to-many relationship (such as Customer and Product), they absolutely should not be part of the same dimension. They are separate dimensions that connect to each other through the fact table. The temptation may be to put products as a child of customer so you can easily see which products each customer purchased, but when it's time to show all sales for Product A, it becomes a nightmare. I do not know of a valid exception to this principle, but I certainly have seen it ignored by new cube designers.


Situation 2:  Level-Based Hierarchy

When a series of entities are related as one-to-many going forward and one-and-only-one going backwards, they are part of the same dimension hierarchy. Further, it is a level-based hierarchy because each forward relationship is mandatory.

The example below shows part of the Canadian census geographic levels and how they are subdivided into smaller and smaller areas. While it's obvious they are part of the same dimension, working out the relationships helps determine the type of hierarchy to use.


Situation 3:  Parent-Child Hierarchy

Similar to Situation 2, entities that are related as zero-to-many going forward and one-and-only-one going backwards are also part of the same dimension hierarchy, but because it is optional at one or more levels, it is a parent-child hierarchy. Just like with families, some people have children and grand-children while other people do not. This asymmetry is not allowed in a level-based hierarchy.

The example below shows an account structure, where Account Group may be Asset, Account may be Bank Account, and Sub-account may be TD Bank Account #12345. Because one level of the hierarchy is optional (Sub-account), the entire hierarchy becomes parent-child.


Situation 4:  Member Properties (Synonyms)

This situation is not as intuitive as the previous examples, but I have seen it built improperly in cubes on multiple occasions. Entities relate to one base element in what appears to be two distinct hierarchies. The example below shows Customers rolling up to geographic regions and also through the Sales department.


Because both sets are based on Customer, it doesn't make sense to create two separate dimensions. Building one dimension with a default hierarchy and an alternate hierarchy looks like a reasonable option. However, before making that decision it is important to understand the relationships between the hierarchies. How does Region relate to Regional Sales Managers? In this case, it turns out that each RSM oversees one and only one Region, and simliarly for Sales VP's and Districts.


Since the geography and sales entities are therefore equivalent, we don't need two hierarchies. We just need alternate labels on one hierarchy. For example, in Microsoft cubes (SSAS), these alternate labels are called Member Properties. Some reports will show Districts and Regions while other reports show Sales VP's and RSM's, but the numbers in each bucket will always be identical.


I have found people who are new to cube design tend to shun data modeling exercises because it's time consuming and in some cases it appears unnecessary. However, understanding your data relationships clearly by building some simple ER diagrams can really help to design a cube that works efficiently.