Modelling matters

All data is relational

Cory Isaacson
stop1

Why data relationships are critical to any successful database design.

Today you have more choices than ever when it

comes to DBMS engines. There are the “traditional” RDBMS engines
(like MySQL, PostgreSQL, Microsoft SQL Server, Oracle…),
NoSQL/NewSQL engines (MongoDb, Redis, Cassandra…), and BigData
engines (Hadoop, Spark…), with more entrants every
quarter.

This array of choices can be dizzying for an
architect, as how do you know which to use, and which engine(s) are
right for your application? Do you rely on a single DBMS engine, or
do you need more than one to fit your requirements? More
importantly, how do you ensure you are using the selected engine
properly?

With all these types of DBMS engines available,
there is one thing that remains constant in my mind: All Data is
relational.

This idea may sound surprising, and ever-so
“1980s” but bear with me while I explain my reasoning. I think
you’ll see that this concept applies to all DBMS engines,
understanding and applying this simple concept can help you tame
your database tier in the most effective way for your
application.

The meaning of data

Data is only useful if it has meaning – meaning
that is interpreted and used by your application. For data to have
meaning, it must be related to other data. In fact, I would go so
far as to say that it’s a relational world we live in, and data in
a database is best described as a representation of that
world.

For example, let’s say you have a piece of data:
colour

Immediately, you will ask the question colour of
what? Well, it could be the colour of a chair, an image, an animal…
almost anything.

Now let’s say we are talking about a car, and by
colour we are referring to an attribute of a specific car. Through
this relationship the data now has meaning, now we understand what
we are talking about.

Of course there are lots of more complex
relationships than this in the world. For example, a single car
manufacturer produces many cars; this is a one-to-many
relationship. A customer can buy many cars; this is another
one-to-many relationship. In a single family there can be many
drivers, and many cars; this describes a many-to-many relationship
between car and driver.

In a social network, a friend is associated with
other friends, another example of a one-to-many relationship.
However, this is only partially true, because each friend in the
network can have many friends. Therefore, friend to friend is also
a many-to-many relationship. (This description is the basic
definition of a social graph, a very special type of relationship
management.) The common saying “you are who you know” embodies the
very idea that data has meaning through relationships.

Just take a look around you, there are data
relationships everywhere, this is how the world is organized, and
how we understand things. Even as you read this article, you are
one of many readers, yet another one-to-many
relationship.

Why do Data Relationships Matter?

Since data relationships are inherent in
everything around us, and even between all of us as individuals in
our relationships with other people, it’s easy to see how important
this is – especially when dealing in computer science and
application development.

To take the analogy even further, you even
organize your application code using data relationships. Take this
simple code example:

class Person {
  int ID;
  String firstName;
  String lastName;
  String birthDate;
  String createDate;
  
  List<Person> friends;

} 

...
Person joe = new Person();
joe.id = 1;
joe.firstName = “joe”;
joe.lastName = “jones”;
...

Person miranda = new Person();
miranda.id = 1;
miranda.firstName = “joe”;
miranda.lastName = “jones”;
...

joe.friends.add(miranda);
miranda.friends.add(joe);

In the example you can see that a person class
has various pieces of data (attributes: firstName, lastName, etc.),
and also a list of friends that each person can have. You use this
type of organization every day in your own coding, I’m sure it’s
second nature if you have been coding for any amount of time. Yet
if you examine this, the structure of this code snippet, it is full
of data relationships. The attributes of any object are related
pieces of data, lists provide further relationships – it just goes
on and on.

When working with your database, data
relationships are even more important – you and every member of
your team need to understand the data stored in the database, how
to create it and how to access it. Thus data relationships are
critical in your database design, providing meaning and structure
to your data, regardless of the type of DBMS engine(s) you are
using.

What is also less obvious is that application
performance is often directly correlated to proper database design,
storing commonly related data items together, for fast storage and
retrieval.

Data modelling is critical

You may have already guessed that I am somewhat
of a data modelling fanatic. I learned data modelling techniques
early in my career, and have been using this understanding ever
since. A developer or architect who is skilled at data modelling
(which takes practice for sure) can quickly analyze and design
almost any application.

A common misconception is that when using NoSQL
or “NewSQL” DBMS engines, data modelling no longer matters. Data
modelling is often viewed as something that stifles rapid
development, and unfortunately has fallen out of fashion with many
application developers. The common idea has been to store “free
form” objects in a DBMS, and to “work it out as you go”.

Nothing could be further from the truth – the
more relaxed the structure of your DBMS, the more rigorous you must
be with your data model design. A traditional RDBMS engine enforces
some relationship structure by its very definition, but with newer
NoSQL or NewSQL engines, there are few if any rules enforced at
all. This means that all database rules end up in application code,
a very challenging approach even for the best developers. The data
relationships are still needed, and it becomes the burden of every
developer on the team to enforce them in a “free form” database
design.

An all-too-common story I hear is from a team of
developers who quickly started with a “free form” database design,
only to have to rewrite the entire application a few months later.
They found that data inconsistency, redundancy and difficulty
accessing meaningful content ruined their application – making
effective teamwork virtually impossible.

Extend this concept to a BigData database
architecture, and the importance goes up by an order of magnitude.
Using data relationships and intelligent data modelling effectively
as your data grows to Terabytes and Petabytes is the key to
success.

So for any application project, the first thing
you need to concentrate on is a solid data model for your database.
There are many approaches and tools available –the main point is to
adopt an approach that works for you, and to use it
effectively.

Wrapping it up

In this article, I hope you have seen the
importance of data relationships, and why the truism that All Data
is Relational is vital for effective database design and successful
applications. There is much more to cover on this subject, and
future articles will delve into common data modelling approaches,
and exactly how performance and scalability are dependent on
database design.

Author
Cory Isaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true “shared nothing” architecture for relational databases. Cory has authored numerous articles in a variety of publications including SOA Magazine, Database Trends and Applications, and recently authored the book Software Pipelines and SOA (Addison Wesley). Cory has more than twenty years experience with advanced software architectures, and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory can be reached at: cory.isaacson@codefutures.com
Comments
comments powered by Disqus