Performance benchmarking and visualization
Tokutek’s VP of Engineering talks performance benchmarking and the importance of open source for the DB scene.
You have a talk on performance benchmarking coming up at the Percona Live London. Can you tell us some of the performance tips you’re going to be talking about?
My talk covers the most important lessons I’ve learned over my 25-year career as a performance oriented database developer. This includes best practices, simple techniques, things I’ve done wrong, as well as areas I need to improve. One example is building an automated benchmark framework by starting small and adding specific functionality over time.
Another example is techniques to analyze the results, as the number of metrics and measurements have grown substantially over time. My goal is that anyone attending will walk away with at least one or two ideas on how to improve their benchmarking or automation frameworks.
What about visualization? What techniques are you using in this area?
Visualization is an interesting topic in benchmarking, as every good benchmark deserves a graph or two. I use gnuplot exclusively for graphing and charting, it’s so much better than doing graphs manually in Microsoft Excel (which was my old tool of choice).
The last hurdle I continue to struggle with is comparing two benchmark runs. If I showed you a before and after graph of a particular benchmark you could easily detect irregularities between the two, yet it’s very difficult to do it in software. This is a continual work in progress for me.
Could you tell us a bit about Tokutek and what you do there?
Founded in 2006, Tokutek is the premier high-performance database company, and we deliver Big Data processing power across the most important data management platforms. Our breakthrough technology lets you build a new class of applications that handle unprecedented amounts of incoming data and scale with the data processing needs of tomorrow.
Tokutek was formed to commercialize Fractal Tree indexing, which was developed at and is licensed from MIT, Stony Brook, and Rutgers. Fractal Tree indexing replaces B-tree indexing—legacy code dating back to the 1970s—that is used in virtually all leading database management systems. The resulting databases have dramatically improved performance (50x), reduced database size (90%), and increased operational efficiency to meet the requirements of today’s demanding applications.
As Vice President of Engineering at Tokutek, I am responsible for leading the company’s software development efforts. This includes improvements to the Fractal Tree indexing implementation, as well as Fractal Tree Indexes in MySQL (TokuDB) and MongoDB (TokuMX). I am also responsible for product management, infrastructure, and [of course] the benchmarking.
You claim that your performance measurement solutions for Tokutek are all non-proprietary. Is open source important to Tokutek, and in general in the field of DBs?
In my mind open source is critical to the success of Tokutek and database vendors in general. Here are some reasons why:
- Engineers want to be able to share past efforts, as a sort of living resume. They also enjoy learning from the work of other engineers and providing constructive criticisms.
- Some companies are unwilling to evaluate closed source software entirely. This is a big barrier to entry for new database solutions.
- Open source simply makes software better. Users of the community editions of TokuDB and TokuMX are constantly providing feedback, and sometimes bug reports, which makes the software better for enterprise and community customers alike.
Finally, a question that’s very important to many developers – what’s your view on programming to music – total distraction or concentration enhancement? If you do listen to music while coding, what kind?
Great question, and one that I’m sure varies from person to person. I usually program to music when I’m working on hard problems and podcasts when the effort requires less intense concentration. My favorite coding music is a genre known as nerdcore, and MC Frontalot is always first on my playlist.