Data Modelling – 101
In the first of a three part series, Cory Isaacson gives us a crash-course guide to Data Modelling.
Previously, I reviewed the meaning of data, and more importantly the key concept that all data is relational. To recap, data in an application has no meaning unless it is related to other data. With these relationships, the data can be used to meet the requirements of your application and the needs of your organisation. This concept applies to all types of DBMS engines – traditional Relational DBMSes, NoSQL, NewSQL, you name it.
To ensure your database structure is useful, meaningful and meets your particular application needs, it is critical that you create a data model. I have always found this vital in building a small project as an individual developer, and know from hard-won experience managing many software teams that it is even more important on large projects.
A data model is critical, everyone on the team needs to know it and understand it, if you are going to have a successful application that delivers as expected – functionally and with regards to performance.
So why bring so much attention to a data model and the data modelling process? Because in working with developers over the years (and particularly in recent years on a variety of BigData projects), I have found this to be a very important step in the application development process – a step that often gets skipped or done without due importance.
In this article, I will cover the fundamental concepts of data modelling, and the process for developing a workable data model.
The ideal data model
There are many types of data modelling approaches; some are very detailed with an extensive number of steps. The style I prefer is an Entity-Relationship data model that shows all entities (tables) and attributes (columns), with all the main relationships specified. This makes it easy to read and use, without making it over-complicated.
In the next sections, I’ll walk through an example model, using the streamline Entity-Relationship Modelling approach, tried and tested over years of work and 1000s of applications.
One other thing to keep in mind when going through this article is that there are numerous excellent data modelling tools on the market. You can use one
of these, or just create your model with any visual graph tool. I like to use an actual data modelling tool, as long as it conforms to the modelling process I have adopted. Such tools can save a lot of time, helping with things like naming consistency, validation of relationships, and other useful capabilities. There are many affordable options available.
DefinitionsThe Entity-Relationship Modelling approach is tried and tested, if you stick to a few simple rules then the basic process is easy to learn, and fast to implement. The place to start is with a few basic definitions:
- Entity-Relationship Model: The Entity-Relationship Model (or ER model) is a way of graphically representing the logical relationships of entities (or objects) in order to create a database.
- Entity: An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. Entities can be thought of as nouns. Examples: A computer, an employee, a song …
- Attribute: Entities…have attributes. Example: an employee entity might have a Social Security Number (SSN) attribute…
- Relationship: A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: A known relationship between a company and a computer – a supervisor relationship between an employee and a department, or a performance relationship between an artist and a song…
You can see that an Entity-Relationship Model is made up of 3 things: Entities, Attributes and Relationships. Here are some additional notes that can be helpful in working with these concepts:
- An Entity is the core, and normally translates to a table or other discrete data structures (such as an Object in a NoSQL database).
- An Entity must be uniquely identifiable.
- You can think of each instance of an Entity as its own object or thing that you are representing in your database.
- Entities have Attributes, which are additional data elements that describe or define an Entity.
- Entities have Relationships between them, based on matching Attribute values. For example, CustomerOrder Entity may be related to a Customer Entity, using the Customerld Attribute.
- With these basic definitions in hand, we can look at how Relationships work in Entity Relationship Modelling, and how it all fits in with the Entity Relationship Modelling Process.
The types of relationships
Now let’s look at the types of Relationships, the very core of Entity Relationship Modelling:
One-to-One Relationship: This is where a data element or objects relates to exactly one instance of an Entity.
One-to-Many Relationship: An instance of an Entity can relate to many instances of another Entity. Here are some examples: A Car has many Doors, a Company has many Employees. The One-to-Many Relationship is the most common type found in an Entity-Relationship Model.
Many-to-Many Relationship: In this case, one instance from Entity A can relate to many instances in Entity B, and each of the instances of Entity B can in turn be related to many instances of Entity A. A simple example is as follows: A CustomerOrder instance can contain many Product instances, and Product instance can be included in many CustomOrder instances. This is by far the most complex of the three Relationshop types, and is easily resolved with a special Join Entity, an Entity that is created to break such a Many-to-Many Relationship down into two One-to-Many Relationships. Extending our simple example, you can add a CustomOrderLine Entity, containing one order line per Product purchased, and now end up with two One-to-Many Relationships:
CUSTOMER ORDER | CUSTOMERORDERLINE AND PRODUCT | CUSTOMERORDERLINE.
As you can see, there are only three types of Relationships that can exist in a database. Later on we will discuss how to normalize your Entity-Relationship Model, to streamline and simplify your model, such that the model has only One-to-Many Relationships once the process is complete.
Entity-Relationship Modelling: The process
The basic process for Entity-Relationship Modelling has these simple steps:
- Discover Entities
- Discover Attributes
- Discover Relationships
Then there is a final step to normalize your model to ensure it is correct and will function. (I also add one last step which is to then de-normalize the model for performance and convenience, a technique I will cover in a later article).
So how do you go about this? It’s fairly easy, and after a while it becomes second nature – you may even find yourself “thinking relationally” about lots of software problems.
The key is the word discover, as that is how you do it. You look around at all of the things involved in the system you are modelling, and start noting down Entities. As you do that, you inevitably start to discover Attributes for your Entities, and then you can discover Relationships for your Entities.
Let’s say you want to start your own online Music site, organizing all of the coolest new songs for users.As we look at the area of Music, we can discover all sorts of obvious Entities:
Then we can discover Attributes for these Entities, for example the Artist and Song Entities might look like this:
– Title– Genre
You’ll notice that I have added an id Attribute to each Entity, which typically is just a sequential key. This makes working with the database far simpler, especially when describing relationships with primary keys and foreign keys (much more about that in a future article).
With this much of the model done, we can start to discover Relationships. Here are two obvious ones:
- Artist to Song: One-To-Many
- Album to Song: One-To-Many
However, if you review this model carefully, you will see that it is over-simplified, even for this rudimentary example. The flaw in the logic is that a single Song can have more than one Artist, so really the Artist to Song Relationship is Many-to-Many. That is the first step in normalizing your model, resolving any Many-to-Many Relationships. There are other rules for normalizing your data model, and I’ll cover that in depth in a future article.
Hopefully you can see that with not much work, you can easily define the basic structure of your database in a very short time. We don’t really have a workable data model yet, but this section did review the most important steps in the discovery process.
Wrapping it up
This article covered the basics of Entity-Relationship Modelling, providing you with the basic structure and process. In future articles I will review a much more in-depth example of a complete Entity-Relationship Model, and delve into database normalization, an important part of the modelling process.