So I'm going to take a few posts and go back to the basics.
Data Normalization
The concept of data normalization was first described by Edgar F. Codd in 1971 while he was working at IBM. He is credited with creating the theoretical basis for relational databases.The purpose of normalized data structures included:
- minimizing data storage
- reducing data write times
- ensuring data consistency
- ensuring data queries would return meaningful results
First Normal Form
There are different definitions of the first normal form (1NF) on the internet, but I find it easiest to think of it in these terms:- Each field has only one value
- Each row has a unique key
By separating out the 2010 and 2014 information into distinct rows, this dataset becomes compliant with 1NF. The Year column provides a unique key for each row.
However, let's take the dataset back one year when there was an unusual situation. Two people, Sam and Maria, shared the award in 2009. Once again our dataset violates 1NF because two fields have multiple values, Winner and Department.
We solve the problem just like we did before by separating out the 2009 record into two rows, one for each award recipient. However, now the Year is no longer a unique key. The year 2009 shows up twice, so we need to add another field to uniquely identify each row. Winner works nicely, so now Year and Winner together become the unique key. Whenever there is more than one field in a key, it is called a Compound Key, and that leads nicely into the next article on Second Normal Form (2NF).
No comments:
Post a Comment