Dig in, data crunchers

Scaling for Big Data: An introduction

Cory Isaacson

Database tech can make or break an app – but getting the fundamentals can be daunting. In this new series, Cory Isaacson breaks it all down in easily digestible chunks.

Welcome to the first article in my new column Scaling for Big Data. This column will be an exciting project, covering a variety of topics and techniques on scaling your database to meet the ever-challenging requirements of the rapid growth in transaction and data volumes. 

You could say that we are currently experiencing a data boom, with databases growing faster and larger than ever conceived even just a few years ago. Every application depends on its data. If you are one of those very clever developers building the next great social media app, working on the hottest new game technology, or concentrating on core business functions like e-commerce or traditional enterprise functionality, then database technology is critical to everything you do.

I know from hard work and experience, that understanding database technology is often the key to success in an application, or the cause of untold frustration, long hours and outright failures. Doing it right can make you a huge success, and missing the mark can spell disaster. It can seem a daunting task to learn everything you need to know about database technology so that you can make the right decisions, yet the truth is all database management systems (DBMS) work on the same principles and share the same concepts. Once you know the fundamentals, you can understand and utilize any database technology in an effective manner, delivering on the promise of your application.

By way of introduction, I have been in the software industry for over 20 years, and have run many companies, from start-ups to established businesses. My focus has been on database technology, either in professional services firms or managing product companies. I have had the good fortune of working with and learning from some of the smartest technologists in the world, from data architects to application developers in a wide variety of fields. From the start I have always had a passion for database technology – the most critical element of any successful application, and often the one that presents the most technical challenges.

In my career I’ve worked with just about every database you can imagine, starting with Sybase, Microsoft, SQL Server, Oracle, and in recent years focusing on the open source offerings which of course include MySQL and PostgreSQL. And now with the global move to Big Data, I find it important to understand newer database technology options, including products such Hadoop, MongoDb, Cassandra, and column databases like MonetDb and InfoBright… the list really goes on and on.

Why is this important to you as a developer? Because at the very heart of your application is your database tier. It can make or break your application, and I hope that this column will help make your database a winner.

Today we are incredibly fortunate given the wide number of strong choices for database technology. Now there are so many options available, giving application developers the ability to scale data like never before. This array of options presents an incredible number of opportunities, but also many questions:

  • How do you know which option is best for your application?
  • Should you stick with traditional Relational Database Management (RDBMS) options, or will newer offerings in the NoSQL or data analytics space provide a better fit?
  • How and when should you use Index engines?
  • What about keeping your database reliable and operational for a 24X7 application?
  • Where does caching fit into the mix?

The truth is that no single database technology can meet all requirements, and indeed I find that most applications need to use more than one database technology. The reason is simple enough – different aspects of your application have different needs, and each DBMS engine is good at a particular type of job. Thus I find myself in the incredibly fortunate position of having an almost limitless number of topics available when covering Scaling for Big Data.

My objectives of the column are simple: to provide as much useful information as possible, information that you can directly apply to your database requirements. More specifically, I’ll be covering topics such as:

  • Why databases slow down.
  • Scaling with traditional Relational DBMS engines.
  • Database performance optimization techniques.
  • Database design for maximum performance and flexibility.
  • Using non-relational DBMS engines.
  • Big Data analytics.
  • Indexing engines, why they are important and how to use them.
  • Database caching opportunities and techniques.
  • Keeping your database highly available.
  • Database disaster recovery strategies.

The focus will be on practical articles that you can use to conquer your database challenges. I would also like to hear from you, what topics you would find most helpful. Further, if you have a great idea for an entry in the column, I will review and consider it. Just email me at [email protected], I’d really like to hear from you.

I hope you enjoy Scaling for Big Data, and that you find it helpful to your application development efforts. With a scalable database tier you can accomplish almost anything in the application development world, and together we can make that a reality.

Cory Isaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true “shared nothing” architecture for relational databases. Cory has authored numerous articles in a variety of publications including SOA Magazine, Database Trends and Applications, and recently authored the book Software Pipelines and SOA (Addison Wesley). Cory has more than twenty years experience with advanced software architectures, and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory can be reached at: [email protected]

Inline Feedbacks
View all comments