Friday, 10 April 2015

Back to the Basics - OLAP vs OLTP

In the last article we looked at Third Normal Form. In this article we look at a de-normalized data structure for Business Intelligence data analysis called OLAP.

OLAP vs. OLTP

In the 1980's a new method of structuring data for analysis emerged called On-Line Analytical Processing (OLAP). It was in contrast to the prevailing method for structuring data, which was On-Line Transactional Processing (OLTP).

OLTP was designed to accomplish certain purposes with regard to databases, namely:
  • efficient data storage
  • fast performance when writing data (either creating or updating records)
  • rapid query response time for returning individual records
OLTP accomplishes these purposes by strictly adhering to Third Normal Form (3NF). No data element is stored twice anywhere in the database, which ensures efficient storage and writing performance.

The majority of database systems in the world are OLTP. Think of your bank's system that tracks your chequing account transactions or your grocery store's Point-of-Sale system at the check-out counter that tallies your scanned products. Transactions still rule the world and OLTP systems will always have a prominent place in it.

However, for all its strengths, OLTP has a major weakness -- it can be slow when trying to aggregate large amounts of transaction data. For instance, my bank system is very fast at processing a cash withdrawal at an ATM or showing my last 10 transactions for the month. However, if an analyst at the bank wants to know the average deposit amount across all personal chequing accounts in eastern Canada in 2014, that may take a while. In fact, it may slow down the entire system while the query crunches through the millions of records necessary to find the answer.

This is why OLAP was created. It had one purpose only:
  • fast performance for aggregation queries
Note there is nothing mentioned about efficient data storage or data writing performance. OLAP only cares about data aggregation performance. That single purpose is tremendously freeing because all other concerns become secondary. If storing the same data element in 3 places in the database will improve aggregation query performance, then we can do it. If it slows down data update performance, that's OK. If finding an individual record takes longer than in an OLTP system, who cares?

The main consequence of OLAP's singular purpose is that it does NOT follow 3NF. OLAP is a non-normalized data structure. Some Database Administrators (DBA's) are so used to using 3NF that they try to force normalization into OLAP, but that defeats the purpose of OLAP. Normalization slows queries because it introduces additional table joins, and joins are the enemy of fast query performance.

In the next article we will compare the data models that support OLTP and OLAP.




No comments:

Post a Comment