JAX London 2014: A retrospective
Modelling matters

All data is relational

CoryIsaacson
stop1

Why data relationships are critical to any successful database design.

Today you have more choices than ever when it comes to DBMS engines. There are the “traditional” RDBMS engines (like MySQL, PostgreSQL, Microsoft SQL Server, Oracle…), NoSQL/NewSQL engines (MongoDb, Redis, Cassandra…), and BigData engines (Hadoop, Spark…), with more entrants every quarter.

This array of choices can be dizzying for an architect, as how do you know which to use, and which engine(s) are right for your application? Do you rely on a single DBMS engine, or do you need more than one to fit your requirements? More importantly, how do you ensure you are using the selected engine properly?

With all these types of DBMS engines available, there is one thing that remains constant in my mind: All Data is relational.

This idea may sound surprising, and ever-so “1980s” but bear with me while I explain my reasoning. I think you’ll see that this concept applies to all DBMS engines, understanding and applying this simple concept can help you tame your database tier in the most effective way for your application.

The meaning of data

Data is only useful if it has meaning – meaning that is interpreted and used by your application. For data to have meaning, it must be related to other data. In fact, I would go so far as to say that it’s a relational world we live in, and data in a database is best described as a representation of that world.

For example, let’s say you have a piece of data: colour

Immediately, you will ask the question colour of what? Well, it could be the colour of a chair, an image, an animal… almost anything.

Now let’s say we are talking about a car, and by colour we are referring to an attribute of a specific car. Through this relationship the data now has meaning, now we understand what we are talking about.

Of course there are lots of more complex relationships than this in the world. For example, a single car manufacturer produces many cars; this is a one-to-many relationship. A customer can buy many cars; this is another one-to-many relationship. In a single family there can be many drivers, and many cars; this describes a many-to-many relationship between car and driver.

In a social network, a friend is associated with other friends, another example of a one-to-many relationship. However, this is only partially true, because each friend in the network can have many friends. Therefore, friend to friend is also a many-to-many relationship. (This description is the basic definition of a social graph, a very special type of relationship management.) The common saying “you are who you know” embodies the very idea that data has meaning through relationships.

Just take a look around you, there are data relationships everywhere, this is how the world is organized, and how we understand things. Even as you read this article, you are one of many readers, yet another one-to-many relationship.

Why do Data Relationships Matter?

Since data relationships are inherent in everything around us, and even between all of us as individuals in our relationships with other people, it’s easy to see how important this is – especially when dealing in computer science and application development.

To take the analogy even further, you even organize your application code using data relationships. Take this simple code example:

class Person {
  int ID;
  String firstName;
  String lastName;
  String birthDate;
  String createDate;
  
  List<Person> friends;

} 

...
Person joe = new Person();
joe.id = 1;
joe.firstName = “joe”;
joe.lastName = “jones”;
...

Person miranda = new Person();
miranda.id = 1;
miranda.firstName = “joe”;
miranda.lastName = “jones”;
...

joe.friends.add(miranda);
miranda.friends.add(joe);

In the example you can see that a person class has various pieces of data (attributes: firstName, lastName, etc.), and also a list of friends that each person can have. You use this type of organization every day in your own coding, I’m sure it’s second nature if you have been coding for any amount of time. Yet if you examine this, the structure of this code snippet, it is full of data relationships. The attributes of any object are related pieces of data, lists provide further relationships – it just goes on and on.

When working with your database, data relationships are even more important – you and every member of your team need to understand the data stored in the database, how to create it and how to access it. Thus data relationships are critical in your database design, providing meaning and structure to your data, regardless of the type of DBMS engine(s) you are using.

What is also less obvious is that application performance is often directly correlated to proper database design, storing commonly related data items together, for fast storage and retrieval.

Data modelling is critical

You may have already guessed that I am somewhat of a data modelling fanatic. I learned data modelling techniques early in my career, and have been using this understanding ever since. A developer or architect who is skilled at data modelling (which takes practice for sure) can quickly analyze and design almost any application.

A common misconception is that when using NoSQL or “NewSQL” DBMS engines, data modelling no longer matters. Data modelling is often viewed as something that stifles rapid development, and unfortunately has fallen out of fashion with many application developers. The common idea has been to store “free form” objects in a DBMS, and to “work it out as you go”.

Nothing could be further from the truth – the more relaxed the structure of your DBMS, the more rigorous you must be with your data model design. A traditional RDBMS engine enforces some relationship structure by its very definition, but with newer NoSQL or NewSQL engines, there are few if any rules enforced at all. This means that all database rules end up in application code, a very challenging approach even for the best developers. The data relationships are still needed, and it becomes the burden of every developer on the team to enforce them in a “free form” database design.

An all-too-common story I hear is from a team of developers who quickly started with a “free form” database design, only to have to rewrite the entire application a few months later. They found that data inconsistency, redundancy and difficulty accessing meaningful content ruined their application – making effective teamwork virtually impossible.

Extend this concept to a BigData database architecture, and the importance goes up by an order of magnitude. Using data relationships and intelligent data modelling effectively as your data grows to Terabytes and Petabytes is the key to success.

So for any application project, the first thing you need to concentrate on is a solid data model for your database. There are many approaches and tools available –the main point is to adopt an approach that works for you, and to use it effectively.

Wrapping it up

In this article, I hope you have seen the importance of data relationships, and why the truism that All Data is Relational is vital for effective database design and successful applications. There is much more to cover on this subject, and future articles will delve into common data modelling approaches, and exactly how performance and scalability are dependent on database design.

Author
CoryIsaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true
Comments
comments powered by Disqus