Informatics Musings: Back to the Basics

This is part of a series of posts on some of the basic concepts of data management and business intelligence concepts. We've previously discussed the differences between OLAP and OLTP. In this post we look at the two main implementations of OLAP: Relational OLAP and Multi-dimensional OLAP.

Review of OLAP

On-Line Analytical Processing (OLAP) is a method of structuring data to permit quick aggregation queries. It makes use of the Star Schema to limit the number of query joins when connecting different data entities. Business Intelligence tools that make use of OLAP typically permit data to be arranged in rows, columns and pages, with the intersecting cells storing the aggregation results (like Excel Pivot Tables). These tools also allow interactive "drag and drop" functionality to move or add entities on rows to columns and have the aggregation results recalculate automatically.

The user experience for OLAP is similar regardless of the back-end OLAP implementation, but it is important to understand that there are two ways to store and access data in OLAP. The choice affects performance and scalability as the data mart grows and expands and it can affect which tools can be used on the front end.

Relational OLAP

Relational OLAP (ROLAP) uses standard relational databases and tables to store and deliver the data to the BI tool. However, instead of organizing the data in Third Normal Form (3NF) it is organized using a Star Schema. Any BI tool accessing a ROLAP database must be able to aggregate the cell intersection values on the fly.

Because ROLAP uses a standard relational database, the OLAP queries are written using standard SQL scripts. No special query language or functions are required.

Multi-dimensional OLAP

Multi-dimensional OLAP (MOLAP) uses a unique type of database to store and deliver the data to the BI tool. It does not consist of data tables per se but rather a multi-dimensional array that is designed to quickly aggregate the cell intersection values automatically.

Because MOLAP does not use a standard relational database, the OLAP queries are typically not written in SQL. Microsoft created a MOLAP query language for their own cube database (SSAS) called MDX and most other data cube vendors have since adopted it. (Oracle is the exception as they still use SQL with their cubes.) However MDX is not an industry standard. Microsoft owns it and they can modify it however and whenever they like.

MDX has some similarities to SQL but it is different enough that it requires dedicated training and practice. It is a very syntactic, meaning it's if you're missing a comma in a particular place it will cause the query to fail!

Comparison of ROLAP and MOLAP

Both ROLAP and MOLAP deliver all of the OLAP functionality to the end users, so from the users' perspective there is no apparent difference. There are differences in terms of performance though as well as in back-end maintenance considerations.

I have worked with multiple ROLAP and MOLAP clients across multiple database vendors and multiple BI tools. From this experience, these are the main differences I have witnessed.

Back-End Maintenance: Advantage MOLAP

MOLAP cubes can be implemented with relatively less back-end IT support than a ROLAP solution. While both require IT support, a ROLAP environment typically requires a small ETL department to maintain and load the data into the growing number of tables that support it. MOLAP also requires data loading and maintenance at the back-end, but it's usually possible for one DBA to maintain a series of MOLAP cubes.

For example, Prophix budgeting software, which sits on top of SSAS MOLAP cubes, actually permits the admin end users to do most of the data loading and maintenance processes themselves, thereby (almost) eliminating the need for IT support. I never seen this kind of functionality built on top of a ROLAP database.

Scalability: Advantage ROLAP

As a data mart grows in terms of number of fact records and number of entity members, both MOLAP and ROLAP performance can deteriorate. However, MOLAP seems to do worse when dimensions get large. While you can throw more hardware (memory and processors) at a MOLAP problem to improve performance on a large cube, it has a limited affect. Eventually the only option left is to split a MOLAP cube into smaller cubes, and that may not suit the business' analytical needs.

When ROLAP databases slow down, there is the added flexibility of being able to add more aggregate tables to the database. Aggregate tables calculate and store the sums of certain cell intersections so the BI tool doesn't have to make those calculations on the fly. This reduces the load on the BI tool and common reports that reference this aggregate table will generally load faster. These aggregate tables must be updated during the nightly ETL (Extract, Transform and Load) processes so the trade-off is a larger, slower and more complex ETL code. It also creates a more complex data model, as each BI report that is running slow may require its own new aggregate table in the background. If end users have direct access to query the data tables, these aggregate tables can make their queries more complex to build and if the aggregate tables include filter conditions, it can allow users to generate erroneous results if they are not aware of those hidden filters. While there are obviously trade-offs when considering aggregate tables, at least ROLAP provides the flexibility to choose between those trade-offs. MOLAP provides not such option.

When it comes to large-scale enterprise-wide data warehouses, MOLAP cannot keep up with the increasing data demands. ROLAP is the most common choice for these situations.

Query Language: Advantage ROLAP

ROLAP uses SQL to query the data while MOLAP uses MDX. This is a huge advantage for ROLAP for two reasons:

There are far more SQL developers in the world than MDX developers, so staffing your OLAP team is much easier.
SQL is a simpler language to learn than MDX as MDX is more syntactic than SQL. (Translation: MDX is a fussier language that is more difficult to debug!)

Summary

Each method has its own strengths and weaknesses. MOLAP is more common among small and mid-size companies while ROLAP is more common in large companies. The key is to understand your priorities and choose accordingly.

Informatics Musings

Friday, 1 May 2015

Back to the Basics - ROLAP vs MOLAP