Mo' data, mo' costs
As your database grows in size, the data stored can be of tremendous value to your organization – if you can retrieve the data in a meaningful way to reap the benefits. This is the key to a successful database, being able to write to the database at the required rate and volume, and then being able to read the data stored in a high-performance, useful manner.
Most often I see performance issues on database reads, and in this article I’ll cover the most common solution to this problem.
As a general rule, the place to start when you encounter database read performance issues is to optimize your single monolithic DBMS instance. This should be done prior to scaling your database, and often provides a very quick return on improving results. In other words, improving the performance of a monolithic DBMS instance is normally very effective when you find that database reads are slow, and database writes are performing acceptably.
Note: When your monolithic DBMS can no longer accommodate your database write load, this is the time to consider scaling your database for Big Data (the subject of many future articles…).
The purpose of Database Indexing is to improve database read performance. Understanding what in index is, and how it works is the key to implementing a successful Database Indexing strategy.
What is an Index?
In a prior article, I discussed the Enemies of Database Performance (see [link here]). The #1 Enemy is the Table Scan, a sequential read of a table from top to bottom in order to find the requested rows or objects. A Table Scan can almost always be avoided by adding an Index.
Think of a Database Index as a separate, optimized copy of a portion of a data structure or table, stored in such a way as to optimize reading that data for a specific purpose. It’s easiest to compare a Database Index to the card catalogue in your local library – you look up the book you want to find in the card catalogue, and it points you to the location of the book on the appropriate shelf. The card catalogue is sorted in such a way so that finding the book you want is easy (such as by title or author). Imagine what would happen if you could not find the book in the card catalogue, your only alternative would be to search all the shelves in the library – a very time consuming task. In fact, a Table Scan is exactly analogous to searching the entire library, and gives you an idea just why it causes such notable performance degradation.
By creating an Index on your database, your queries can search the Index to find the location of the rows you are searching for, and then go to the actual location to retrieve the rows – a highly efficient mechanism for retrieving the specific data that your application needs.
In just one recent example, I was working with a customer on a database performance issue for a complex application with over 1000 concurrent users. s. By proper implementation of just a small number of indexes (about 6 in total), we were able to reduce database CPU load by about 70%, providing the necessary breathing room needed to allow the application to grow.
Therefore, having the correct indexes on your database is critical to database read performance to meet your specific requirements. The remainder of this article provides an overview of the cost of indexing.
The cost of indexing
As covered above, and Index is a separate copy of a specific portion of a table or data structure in your database, optimized for a specific search path. In most DBMS engines, database indexes are maintained by the DBMS itself; once you define the Index, as each new row or object is added, the portion of the data that the index requires is written to the Database Index at the same time.
This makes it very easy to add and use indexing in your database, but it does come at a cost. For each database write, the DBMS engine must now write the actual data, and then write the Index data as part of the same operation. For some types of indexes, especially on large tables, the cost of adding new data to an index can be very expensive in terms of updating the index structure – a cost that can increase geometrically as the table or store size grows. This invariably slows down database writes, and is a common complaint I hear from developers suffering from slow write performance. Just a single index on a large table or store can dramatically slow write performance, while adding multiple indexes can be disastrous, creating the dreaded geometric performance degradation curve for database writes.
Wrapping it up
Database Indexing is a critical topic when optimizing a single, monolithic database, and as you will see in future articles, it’s even more critical in high-performance scaling for Big Data. After all, what benefit is building a huge Big Data store if you can’t easily find what you need once your database it loaded? You can also see that there is a definite cost to indexing, something you must be aware of as you design your index structure.
Therefore, it’s important to understand the basics about indexes, how they work, and the various types of indexes available – this will be the topic of next month’s column.