Mo' data, mo' costs

Database indexing

Cory Isaacson
index.1

What’s the key to creating a successful database? And what’s standing in your way from achieving it? Cory Isaacson lays it out for us.

As your database grows in size, the data stored

can be of tremendous value to your organization – if you can
retrieve the data in a meaningful way to reap the benefits. This is
the key to a successful database, being able to write to the
database at the required rate and volume, and then being able to
read the data stored in a high-performance, useful
manner.

Most often I see performance issues on database
reads, and in this article I’ll cover the most common solution to
this problem.

As a general rule, the place to start when you
encounter database read performance issues is to optimize your
single monolithic DBMS instance. This should be done prior to
scaling your database, and often provides a very quick return on
improving results. In other words, improving the performance of a
monolithic DBMS instance is normally very effective when you find
that database reads are slow, and database writes are performing
acceptably.

Note: When your monolithic DBMS can no longer
accommodate your database write load, this is the time to consider
scaling your database for Big Data (the subject of many future
articles…).

The purpose of Database Indexing is to improve
database read performance. Understanding what in index is, and how
it works is the key to implementing a successful Database Indexing
strategy.

What is an Index?

In a prior article, I discussed the Enemies of
Database Performance (see [link here]).  The #1 Enemy is the
Table Scan, a sequential read of a table from top to bottom in
order to find the requested rows or objects. A Table Scan can
almost always be avoided by adding an Index.

Think of a Database Index as a separate,
optimized copy of a portion of a data structure or table, stored in
such a way as to optimize reading that data for a specific purpose.
It’s easiest to compare a Database Index to the card catalogue in
your local library – you look up the book you want to find in the
card catalogue, and it points you to the location of the book on
the appropriate shelf. The card catalogue is sorted in such a way
so that finding the book you want is easy (such as by title or
author). Imagine what would happen if you could not find the book
in the card catalogue, your only alternative would be to search all
the shelves in the library – a very time consuming task. In fact, a
Table Scan is exactly analogous to searching the entire library,
and gives you an idea just why it causes such notable performance
degradation.

By creating an Index on your database, your
queries can search the Index to find the location of the rows you
are searching for, and then go to the actual location to retrieve
the rows – a highly efficient mechanism for retrieving the specific
data that your application needs.

In just one recent example, I was working with a
customer on a database performance issue for a complex application
with over 1000 concurrent users. s. By proper implementation of
just a small number of indexes (about 6 in total), we were able to
reduce database CPU load by about 70%, providing the necessary
breathing room needed to allow the application to grow.

Therefore, having the correct indexes on your
database is critical to database read
performance to meet your specific requirements. The remainder
of this article provides an overview of the cost of
indexing.

The cost of indexing

As covered above, and Index is a separate copy
of a specific portion of a table or data structure in your
database, optimized for a specific search path. In most DBMS
engines, database indexes are maintained by the DBMS itself; once
you define the Index, as each new row or object is added, the
portion of the data that the index requires is written to the
Database Index at the same time.

This makes it very easy to add and use indexing
in your database, but it does come at a cost. For each database
write, the DBMS engine must now write the actual data, and then
write the Index data as part of the same operation. For some types
of indexes, especially on large tables, the cost of adding new data
to an index can be very expensive in terms of updating the index
structure – a cost that can increase geometrically as the table or
store size grows. This invariably slows down database writes, and
is a common complaint I hear from developers suffering from slow
write performance. Just a single index on a large table or store
can dramatically slow write performance, while adding multiple
indexes can be disastrous, creating the dreaded geometric
performance degradation curve for database
writes.

Wrapping it up

Database Indexing is a critical topic when
optimizing a single, monolithic database, and as you will see in
future articles, it’s even more critical in high-performance
scaling for Big Data. After all, what benefit is building a huge
Big Data store if you can’t easily find what you need once your
database it loaded? You can also see that there is a definite cost
to indexing, something you must be aware of as you design your
index structure.

Therefore, it’s important to understand the basics about
indexes, how they work, and the various types of indexes available
– this will be the topic of next month’s column.

Author
Cory Isaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true “shared nothing” architecture for relational databases. Cory has authored numerous articles in a variety of publications including SOA Magazine, Database Trends and Applications, and recently authored the book Software Pipelines and SOA (Addison Wesley). Cory has more than twenty years experience with advanced software architectures, and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory can be reached at: cory.isaacson@codefutures.com
Comments
comments powered by Disqus