Friday 3 April 2015

Back to the Basics - Third Normal Form

This is the last in a series of articles on data normalization. Click here for the previous article on Second Normal Form.

Third Normal Form

The best-known form of data normalization is Third Normal Form (3NF). When a database is said to be normalized, it usually means it's in 3NF.

Using the example from the previous article, we had our list of Employee of the Year award winners in 2NF. However, it is not in 3NF yet. In order to achieve that level of normalization, every attribute must describe the key.

In this case, the compound key is Year and Winner. The attributes are Reason and Department.

The Reason attribute is describing the keys. Why did the award go to Annette in 2011? Because she implemented the ERP system on time and on budget.

However, does Department actually describe the key? Is IT describing the award that went to Annette in 2011? It's really describing Annette more than the award. This means the table is not in 3NF.

Does it matter that this data is not in 3NF? It does. When we try to ask a question of the data like, "Tell me which department each winner is from," there is a challenge when we reach George Smith. He appears twice and so does him department. When they are the same, it doesn't matter which department value we return, but what if they differ? What if Department equals Sales for the 2010 record but Merchandising for the 2014 record? Which Department is correct for George? Did he change departments between those dates? Or is one of the entries simply an error? How do we know which record is wrong and which is correct? If we need to update George's department, how many records do we need to update? This kind of query ambiguity is why 3NF is important.

To resolve this problem, we need to normalize the data structure. Since Department describes the employees and not the awards, we need to create a separate table for employees as below.


Now the Employee table has a key called Employee and the Department attribute describes the key. Both tables are now compliant with 3NF.

In honour of the creator of Third Normal Form, it can be summarized in one sentence, often referred to as the Database Administrator's Oath:

Each attribute will provide a fact about the key, the whole key, and nothing but the key, so help me Codd!

Benefits of Using 3NF


Note that George is listed twice in the Award table because he won two awards, but he is listed once in the Employee table. There is no longer any ambiguity about his department. It is stored once, so queries will always return the correct department for him and any changes to his department will only have to be written in one place. To repeat from our first article, the main benefits of using normalized data structures are as follows:

  • minimizing data storage
  • reducing data write times
  • ensuring data consistency
  • ensuring data queries would return meaningful results

Sometimes the purpose of our database varies from these goals though. What is data storage costs are cheap and we no longer care about reducing write times because we rarely update a record once it is written in the database? This is where de-normalized data structures come into play. We will look at one such example in the next article.



No comments:

Post a Comment