Spark and Cassandra team up to make database analytics super fast
In a pairing described as ‘magical’, Apache Spark and Cassandra have pooled their resources to deliver analytics up to “100 times” swifter in-memory, and 10 times speedier on disk.
In an industry precedent-setting move, last week
DataStax trumpeted a partnership that will see the integration of
Apache Spark into the Cassandra database. In this interview,
Martin Van Ryswyk, Executive Vice President of Engineering at
DataStax, talks integration essentials, benefits to users, and
shifting enterprise approaches to big data.
JAX: How did this collaboration come
Van Ryswyk: We have been
following the evolution of Spark and are very impressed with both
the technology and the talent at Databricks. The two
technologies are a natural fit and we decided that users deserved a
strong solution backed by the leaders in both technologies.
Do you think there’s anything on the market
comparable to this new solution?
DataStax is the first NoSQL provider to provide faster
analytics for real time data. In the NoSQL space, the only thing
somewhat comparable is Spark with HBase, but again, HBase is more
of a Hadoop data warehouse component whereas Cassandra is used for
online, always-on, transactional applications.
What will be the biggest challenges for
To ensure tight integration between Spark and
Cassandra, mapping of data is necessary. After that comes high
availability and security considerations.
What will be the biggest benefits, and who in
particular do you think it will be useful for?
This new integration gives modern businesses an
alternative to relational databases to deliver near real-time data
analytics for their online applications. Hear from our users below
how they benefit.
“The new Spark/Shark functionality on Cassandra is
giving our users a scalable and high-performance way to quickly
analyze our constantly growing data set. By moving from a
relational database, this new functionality will allow us to
deliver real-time data analytics where before our users relied on
time delayed reports.” – Chanan Braunstein, Director of Next
Gen Homework Applications at Pearson Education
“What we all need is a generic way to run functions
over data stored in Cassandra. Sure, you could go grab Hadoop, and
be locked into articulating analytics/transformations as MapReduce
constructs. But that just makes people sad. Instead,
I’d recommend Spark. It makes people happy.” – Brian
O’Neil, CTO at Health Market Science
Can you see any other players in the industry
in particular following suit now you’ve set this
I can only speak to our own activity but since this is
a leading industry first collaboration, it is reasonable to assume
that other players will follow our lead.
Are you seeing a change in how companies value
and approach data in 2014?
We see an increase with modern enterprises utilizing
data as a strategic asset to compete. Companies are moving towards
more “near term” analytics that can provide data insights in real
time, so they can respond quickly. Because of this, online
applications that interact with customers and collect data have
zero tolerance for downtime and must be capable of reaching and
interacting with their customer’s data no matter where they are