Informatics Musings: Back to the Basics

In an earlier post we touched on the topic of Entity-Relationship Diagrams (ERDs). In this post we will explain one of the more common diagramming standards for ERDs. To begin though we need to understand the types of relationships between entities.

Entity Relationships

In ERDs, entities are shown as boxes and relationships between them are shown as lines. For example, the entity Author has a relationship to the entity Book.

While knowing that a relationship exists between these entities is helpful, it would be more helpful to convey information about the nature of that relationship in the diagram. For instance, does every author have a relationship to a book or only some authors? Can a book exist without an author? Can a single author write multiple books? Can a book be written by multiple authors?

These relationship questions boil down to two that apply to every entity relationship:

Is the relationship optional or mandatory?
Is a single relationship or multiple relationships possible per instance of each entity?

The first question deals with the minimum number of connections (0 if optional, 1 if mandatory) and the second question deals with the maximum number of connections (1 or many). We don't try to specify a firm maximum number because whether it's 2 or 2 million, the impact on the database design will be the same.

Many relationship diagramming methods exist for ERDs but I have found the Crow's Foot design the easiest for me to understand and the most commonly used in the companies I've worked with.

Crow's Foot Diagrams

The Crow's Foot connector uses 3 symbols to describe the minimum and maximum number of relationships: 0, 1 and < (for "many" -- I think of it as a greater-than sign)

Two symbols are used at the end of each line. The first is the minimum and the second is the maximum. There are four acceptable combinations of values as shown below.

Because the "many" symbol resembles a crow's foot, hence the name of this diagramming method.

Some people view the number zero as a letter "O" which stands for "Optional." Both interpretations are equivalent.

Now we can modify the relationship diagram for Authors and Books.

We can read the relationships in both directions. The symbol at the destination end of the line is the one that applies to that direction. These connectors show that each author writes one to many books and each book is written by one to many authors.

Note that these relationships are mandatory (i.e. minimum of one). If an author has not written any books, they cannot be considered an author. Similarly a book cannot exist without an author.

Note that the greater-than sign always means "many" even if it appears backwards (like a less-than sign).

Here is a finance example from a General Ledger data model.

In this system, financial accounts can have sub-accounts but they are not mandatory. For example, the account Office Supplies Expense may have sub-accounts called Copier Paper, Copier Toner and General Office Supplies. However the account Electricity Utility Expense has no sub-accounts because there isn't any further cost breakdown possible. This diagram reads that each account can have zero to many sub-accounts, but that each sub-account must belong to one and only one account. While sub-accounts are optional, once one is created then it has a mandatory relationship back to a single account. Sub-accounts cannot be orphans.

Next example: users and software licences.

Each user must have one and only one licence. (We will stick to the legal options only -- no pirated software!) If a person does not have a licence then they cannot be considered a user of that software. However a licence may be purchased and not assigned to any user. It may be waiting for a new hire who starts next quarter. Therefore a licence can be assigned to zero or one user.

You can begin to see that the definition of the entity can affect the definition of the relationship. The entity User contains an inherent assumption that must be understood in order for the relationship to be modeled correctly, specifically that there is no pirated software allowed.

The importance of entity definition shows up in this example as well. Here a customer buys one to many products, while a product may be bought by no one or many customers. If a person doesn't buy any of our products, they cannot be considered our customer. If however we want to track everyone who walks into our store, then we need a different model. Some people could walk in, look around and leave without buying and therefore without becoming customers. If we call everyone who enters our store "Shoppers" then that model would look like the following:

Now the connector from Shopper to Product becomes optional.

But What About Other Combinations?

You may have noticed that selecting two values (min, max) from three different symbols (0, 1, <) should give us more than just four combinations. Obviously the maximum cannot be less than the minimum so those situations are nonsensical. However, there still are two other combinations:

Zero and only Zero -- this effectively means there is no relationship between these two entities and therefore no line, so it is a non-existent situation in an ERD.
Many to Many -- this means the minimum is greater than one. This situation sounds plausible.

In fact, it's not hard to imagine an entity relationship where the minimum is 2. Consider the diagram below for a genealogy database:

Each parent can have one to many children, but each child must have 2 and only 2 parents. (We'll stick to genetic parents for simplicity.) What Crow's Foot symbol should we use for representing a "2-and-only 2" relationship?

Whenever you encounter a situation like this, it means you need to redefine your entity definitions. You need to take the model to a lower level of detail.

In this case, we need to ask why we created an entity called Parent. Why not two entities called Father and Mother? Or Man and Woman? In genealogy software, those distinctions are bound to be important for reporting.

Once we create two entities out of the Parent entity (Father and Mother), the relationship to Child simplifies to a minimum value of one. Now our "2-and-only-2" relationship has been resolved.

It's important to know that relational databases cannot resolve a connection where the minimum value is greater than 1. Whenever that shows up in your model, you have to create additional entities to simplify the relationships so the minimum connector is either 0 or 1.

Short-Form Description of Entity Relationships

One more important point. When describing an entity relationship in words, the practice is to summarize it by focusing on the maximum connector values only. Minimum values are omitted for simplicity as they have less effect on the data architecture design.

For example, we would say the entity Author has a many-to-many relationship with Book.

Account has a one-to-many relationship with Sub-account, and Sub-account has a many-to-one relationship with Account.

User has a one-to-one relationship with Licence.

While these little Crow's Foot symbols may seem insignificant, ERDs are incredibly helpful in understanding an organization's data and the business definition assumptions inherent in it. We will explore some applications of ERDs using Crow's Foot notation in future posts.

Informatics Musings

Friday, 8 May 2015

Back to the Basics - Crow's Foot Diagrams

Entity Relationships

Crow's Foot Diagrams

But What About Other Combinations?

Short-Form Description of Entity Relationships

No comments:

Post a Comment