Clustrix CEO on the NoSQL hype: a return to sanity is emerging
Confused about the best big data solution for you? Robin Purohit explains why you shouldnt feel like a dork for picking SQL solutions, and weighs up the pros and cons of in-memory computing and real-time analytics solutions.
On the wish list for Big Data technology today: software capable of handling huge volumes of information that moves at a rapid velocity, all in real-time. MemSQL’s recent $35M Series B funding round and Oracle and SAP HANA’s in-memory database options are evidence of the increasing demand for solutions that can meet these requirements. However, whilst many companies are all too aware of their Big Data needs, many remain confused about the technology they should adopt. Robin Purohit, CEO of NewSQL database provider Clustrix weighs in with his opinion on why this uncertainty pervades in the marketplace.
JAX: Why do you think so many companies are mixed-up about in-memory computing and real-time analytics solutions?
Purohit: The Big Data industry is shifting its focus from dealing with vast amounts of data to real-time analytics. Having access to up-to-date data is a key competitive advantage that promises real business benefits to companies of all sizes. Scale-out SQL databases that are able to perform real-time analytics on live operational data are critical enablers for mainstream adoption of this trend. Smart uses of in-memory methods together with flash storage promise high performance at commodity infrastructure prices.
The smart and cost-effective way to use in-memory for real-time analytics is combining distributed architectures with RAM and flash disk. Flash provides I/O latencies of 50 microseconds and costs for terabyte configurations are rapidly approaching that of high-performance hard disk drives.
A smart database can cache the hottest frequently accessed data in RAM and “warm” operational data in flash providing substantial speed-up in query execution in all cases. And equally important, the data is persistently stored on flash disk so it can survive day-to-day infrastructure failures. This is particularly critical for operational data.
Can you walk us through the pros and cons of in-memory/ real-time analytics and the best use cases for each one?
In order to be competitive, enterprises must be able to analyze data in real-time from multiple sources including online transactions, and the huge amounts of unstructured data flowing in from the web, social media, mobile, forums and other sources. In-memory databases are cost-prohibitive, requiring expensive RAM that can only support transactions, not real-time analytics. ClustrixDB offers the best of both worlds, providing real-time analysis of transactional data with a combination of Flash and RAM designed to scale at low costs.
A SSD-backed memory approach is the right way to do in-memory analytics for three primary reasons:
1) Hot data lives in memory, cold data a few microseconds away in SSDs
2) SSDs provide the durability to be system-of-record
3) Scale-out allows running real-time analytics on system-of-record database.
Why do you think people still want the familiarity of a SQL database?
SQL is a long-established standard. Additionally, as a standard query language, SQL requires no additional coding to manage the database and SQL queries can be used to retrieve large amounts of records from a database quickly and efficiently.
Clustrix secured $10m of funding last year to help develop its scale out relational database. What have you put the money towards?
We recently retargeted our ClustrixDB software for commodity servers, making it the first breakthrough scale-out SQL database for the cloud. Our new funding has been used to further that product development and support our rapidly growing customer base worldwide.
What’s your company roadmap for the year ahead?
With our initial software-only release behind us, we are setting our sites on increasing the overall performance of our database to take advantage of commodity hardware, as well as focusing on the optimal use of in-memory tables.
You’ve said before that you want to “aggressively pursue” the growing MySQL/ NewSQL market. Who do you see as your biggest potential users?
Clustrix has been serving production workloads since 2008. Our largest customers have data sets with billions of rows, multiple terabytes of data, and very high transaction rates. We are a good fit for new data-driven websites, including e-commerce, social media analytics and Ad Tech companies.
Can you give us a case-study for usage of Clustrix?
We work with a company called nomorerack, which is the online shopping destination that provides in-demand goods at deep discounts. They chose ClustrixDB to prepare for the holiday shopping season when the website receives 15 to 20 times higher traffic volume. For nomorerack, the potential cost of downtime on Cyber Monday was estimated at more than $500,000 per hour, requiring a business critical cloud database that can flexibly expand online with peak demand. As a result of implementing ClustrixDB, the company experienced a 300 percent increase in peak workload on its database with zero downtime.
Where do you see the Big Data space going in the next five years? Do you think there will always be a place for SQL?
We’re quickly moving toward a world where SQL is still relevant and business-relevant data analysis will support the fast moving, dynamic customer data from operational and social media interactions. Enterprises will want to analyze and act on this new breed of dynamic data in real time.
After two years of relentless FUD that “SQL doesn’t scale” by NoSQL evangelists, a return to sanity is emerging.
Both Google and Facebook recently published work to show not only that SQL can scale, but it is the best approach for certain workloads. Specifically, Google’s F1 database for Adwords enables much simpler application development for very high concurrency OLTP and OLAP workloads, saving those pricy engineers for truly valuable work. And Facebook’s comments that relational databases are essential for analytics added fuel to the new SQL hype programming now being promoted by every major Hadoop distribution.
Big Data developers don’t have to feel like they aren’t cool because SQL is the right answer for their application. They can focus on using NoSQL where it makes sense.