More modelling!

Data Modelling – 103

Cory Isaacson
Abstract image via Shutterstock

For the next instalment of Cory Isaacson’s Data Modelling series, we’re keeping our ‘Angry Shards’ database normalised while adding complexity.

For the past several entries in my column I have covered the fundamentals of practical data modelling techniques. Each of the basic principles of Entity Relationship Modelling has been covered so far, including the idea that all data is relational. This time I will delve into more advanced normalization and de-normalization concepts, extending the Angry Shards game data model example.

The Angry Shards Game Database

In the previous article, I built the start of the Angry Shards game data model, a demonstration database used at CodeFutures. The first few normalized tables – shown in Figure 1 – were built out, including the player, game, and player_game entities.

Screen Shot 2015-05-06 at 15.54.26

I’m going to be adding some new cool features to the model, to support a richer game experience for the user – and of course to demonstrate some new data modelling techniques.

Adding New Features to a Data Model

Let’s add a new feature to the model: support for multiple rounds in the game. This way, as players get better at the game and advance levels, we can track how well they do in each round. To make this more interesting for the player and to encourage competition, let’s record the highest score the player achieves, as well as the highest level obtained.

For our initial attempt at this, we will add some columns to the player_game table. To support up to 3 rounds for each game, let’s add three new columns to the player_game table: game_round_score_1, game_ round_score _2, and game_round_score_3. Each of these columns will hold the resulting score for that particular round of the game. Similarly, to record the highest level and score, we will add high_level and high_score columns to the player_game table. The new model is shown in Figure 2.

I’m sure you may have already noticed this is not the ideal approach, as it causes several problems and limitations in our data model:

  • The game is limited to 3 levels, unless we continue adding a new column each time another level is designed in. This is a hassle, and makes the model very inflexible.
  • It’s not easy to track any other information about each round of play without adding multiple columns for each piece of data needed. For example, if we want to know how long it took the player to play a particular round, this approach would acquire 3 additional columns: game_round_time_1, game_ round_time_2, and game_round_time_3. Therefore, as we enrich the game experience, the model will become more and more unwieldy and cluttered – making it harder for the development team to understand as well.
  • Recording the high_score and high_level on the player_game table is just plan incorrect, as it directly violates the rules of data model normalization.Screen Shot 2015-05-06 at 16.19.52

The reason? Because the high_score and high_level are not related to player_game, but rather belong with player table, since a player can have at most one and only one value for these attributes. Besides breaking theoretical rules, there is a far more practical side to this common error: each time you want to record the high_score and high_level values (at the end of each game), the application must update every player_game row for the player. This is incredibly inefficient as an active player could have 100s of player_game rows, resulting in a large number of updates to the database (and updates are almost always the least efficient operation in a production database).

Given that our objective is a highly flexible model that will support the life of the application and be as easy to work with as possible, we will apply some simple normalization techniques to fix it.

The first step is to move the high_score and high_ level columns to the player table. This associates these columns with the table to which they are truly related, as they can only occur once and only once per player instance.

SEE ALSO: Data Modelling – 101

The next step is to normalize the repeating fields for recording the scores for each round or level. Since the actual relationship between a player_game to a round or level of play is one-to-many, we need to add a new table: player_game_round. We will add the following attributes to the table, resulting in the model shown in Figure 3:

  • player_game_round_id: Sequential ID, matching our standard for other tables.
  • status_code: The status of the round, a numeric code value representing in-progress, complete, abandoned etc.
  • round_play_time: The length of time it took the player to complete the round.
  • round_score: The score that the player achieved for the round.
  • round_level: The level of the game that the round was for.
  • player_game_id: The foreign key to the immediate parent table (player_game). This allows us to express the relationship of a given player_game instance to its player_game_round instances.

You can see that, after applying our normalization techniques, this version is much improved. Now we can flexibly add as many levels to the game as we like, and track various attributes about how a player does while playing those levels. In addition, we can track the high_score and high_level for each player – very useful information that we can provide to the end user.Screen Shot 2015-05-06 at 16.21.05
It should be obvious that with a little work and thought, database normalization techniques can dramatically improve your data model, making it more flexible, efficient, and much easier to understand by your development group.

De-normalize your model?!

Now that we have covered database normalization, what about de-normalization? Why would you want to do this? When do you use it? These are good questions, as at this point in the process you will have likely spent a lot of time and effort normalizing your model.

While database normalization techniques are extremely workable, and provide the backbone of data modelling, the truth is that in some cases a fully normalized model can become unwieldy or inefficient. Based on this, after I finish normalizing a data model, I then de-normalize it as required using the following rule as a guideline: “De-normalize a data model for convenience and performance.”

By de-normalize, we mean breaking some of the database normalization rules, typically duplicating certain data elements in the structure.

Most of the time de-normalization occurs during the development process, when application developers run into functions that are difficult to implement in the fully normalized structure, or when performance bottlenecks are encountered. By selectively de-normalizing your model where needed, you can speed up the development process and often make your database perform dramatically faster. Some common ways you can de-normalize your data model are:

  • Create a special table that joins selected attributes together, providing easy and fast access. This can eliminate some expensive joins for frequently accessed data, making the job of your developers easier at the same time.
  • Resolve challenging relationship structures, such as a social network graph, and links between entities.
  • Store pre-computed values for fast access to specific dashboard or statistical information.

SEE ALSO: Data Modelling – 102

These are just a few of the ways you can use de-normalization, providing you with a preliminary understanding of the topic. Next time I will detail some actual de-normalization examples to provide more information on this topic.

Wrapping it up

In this article, I covered database normalization techniques, and also when to selectively use de-normalization. These approaches make a data model far more meaningful, easier to work with, and increase flexibility – plus help to maximize performance.

Practice in using these techniquees is the best way to learn them inside-and-out – soon you will become a fluent data-modeller, able to apply these concepts to virtually any DBMS. With this knowledge, you’ll be well prepared to deal with scaling your data, maintaining a solid structure that can work well in distributed environments.

Cory Isaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true “shared nothing” architecture for relational databases. Cory has authored numerous articles in a variety of publications including SOA Magazine, Database Trends and Applications, and recently authored the book Software Pipelines and SOA (Addison Wesley). Cory has more than twenty years experience with advanced software architectures, and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory can be reached at: [email protected]

Inline Feedbacks
View all comments