Clustrix CEO on the NoSQL hype: a return to sanity is emerging
Confused about the best big data solution for you? Robin Purohit explains why you shouldnt feel like a dork for picking SQL solutions, and weighs up the pros and cons of in-memory computing and real-time analytics solutions.
On the wish list for Big Data technology today: software
capable of handling huge volumes of information that moves at a
rapid velocity, all in real-time. MemSQL’s recent $35M Series B
funding round and Oracle and SAP HANA’s in-memory database options
are evidence of the increasing demand for solutions that can meet
these requirements. However, whilst many companies are all too
aware of their Big Data needs, many remain confused about the
technology they should adopt. Robin Purohit, CEO of NewSQL database
provider Clustrix weighs in with his opinion on why this
uncertainty pervades in the marketplace.
JAX: Why do you think so many companies are mixed-up
about in-memory computing and real-time analytics
Purohit: The Big Data industry is shifting its
focus from dealing with vast amounts of data to real-time
analytics. Having access to up-to-date data is a key competitive
advantage that promises real business benefits to companies of all
sizes. Scale-out SQL databases that are able to perform real-time
analytics on live operational data are critical enablers for
mainstream adoption of this trend. Smart uses of in-memory methods
together with flash storage promise high performance at commodity
The smart and cost-effective way to use in-memory for real-time
analytics is combining distributed architectures with RAM and flash
disk. Flash provides I/O latencies of 50 microseconds and costs for
terabyte configurations are rapidly approaching that of
high-performance hard disk drives.
A smart database can cache the hottest frequently accessed data
in RAM and “warm” operational data in flash providing substantial
speed-up in query execution in all cases. And equally important,
the data is persistently stored on flash disk so it can survive
day-to-day infrastructure failures. This is particularly critical
for operational data.
Can you walk us through the pros and cons of
in-memory/ real-time analytics and the best use cases for
In order to be competitive, enterprises must be able to analyze
data in real-time from multiple sources including online
transactions, and the huge amounts of unstructured data flowing in
from the web, social media, mobile, forums and other sources.
In-memory databases are cost-prohibitive, requiring expensive RAM
that can only support transactions, not real-time analytics.
ClustrixDB offers the best of both worlds, providing real-time
analysis of transactional data with a combination of Flash and RAM
designed to scale at low costs.
A SSD-backed memory approach is the right way to do in-memory
analytics for three primary reasons:
1) Hot data lives in memory, cold data a few microseconds away
2) SSDs provide the durability to be system-of-record
3) Scale-out allows running real-time analytics on
Why do you think people still want the familiarity of a
SQL is a long-established standard. Additionally, as a standard
query language, SQL requires no additional coding to manage the
database and SQL queries can be used to retrieve large amounts of
records from a database quickly and efficiently.
$10m of funding last year to help
develop its scale out relational database. What have you put the
We recently retargeted our ClustrixDB software for commodity
servers, making it the first breakthrough scale-out SQL database
for the cloud. Our new funding has been used to further that
product development and support our rapidly growing customer base
What’s your company roadmap for the year
With our initial software-only release behind us, we are setting
our sites on increasing the overall performance of our database to
take advantage of commodity hardware, as well as focusing on the
optimal use of in-memory tables.
You’ve said before that you want to “aggressively
pursue” the growing MySQL/ NewSQL market. Who do you see as your
biggest potential users?
Clustrix has been serving production workloads since 2008. Our
largest customers have data sets with billions of rows, multiple
terabytes of data, and very high transaction rates. We are a good
fit for new data-driven websites, including e-commerce, social
media analytics and Ad Tech companies.
Can you give us a case-study for usage of
We work with a company called nomorerack, which is the online
shopping destination that provides in-demand goods at deep
discounts. They chose ClustrixDB to prepare for the holiday
shopping season when the website receives 15 to 20 times higher
traffic volume. For nomorerack, the potential cost of downtime on
Cyber Monday was estimated at more than $500,000 per hour,
requiring a business critical cloud database that can flexibly
expand online with peak demand. As a result of implementing
ClustrixDB, the company experienced a 300 percent increase in peak
workload on its database with zero downtime.
Where do you see the Big Data space going in the next
five years? Do you think there will always be a place for
We’re quickly moving toward a world where SQL is still relevant
and business-relevant data analysis will support the fast moving,
dynamic customer data from operational and social media
interactions. Enterprises will want to analyze and act on this new
breed of dynamic data in real time.
After two years of relentless FUD that “SQL doesn’t scale” by
NoSQL evangelists, a return to sanity is emerging.
Facebook recently published work to show not only that SQL can
scale, but it is the best approach for certain workloads.
Specifically, Google’s F1 database for Adwords enables much simpler
application development for very high concurrency OLTP and OLAP
workloads, saving those pricy engineers for truly valuable work.
And Facebook’s comments that relational databases are essential for
analytics added fuel to the new SQL hype programming now being
promoted by every major Hadoop distribution.
Big Data developers don’t have to feel like they aren’t cool
because SQL is the right answer for their application. They can
focus on using NoSQL where it makes sense.