Throwing out sparks

DataStax Enterprise 4.5 out of the box

Lucy Carey
sprk

Inside the ‘lightning fast’ new release from the Cassandra database experts.


DataStax
, the enterprise flavoured edition of the Apache
Cassandra distributed database management system, have just dropped
version 4.5 of Datastax Enterprise. Announced at today’s Spark Summit 2014, this release
is tuned to make deployment and operation easier, as well as
offering new integrated analytics functionalities that give users a
scalable and high-performance way to quickly analyze mushrooming
data sets. We spoke to  Robin Schumacher, VP of Products,
DataStax, for the full story.

JAX: What are the most significant upgrades to
DataStax Enterprise 4.5 in your opinion?

Schumacher: DataStax
is the first highly scalable NoSQL provider to provide
lightning fast analytics using Spark for real time

data, and this integration enables up to 100X
faster analytics. The other key additions are integration with
external Hadoop installations, automated diagnostic and performance
tuning with a new Performance Service, and expanded visual
management via
DataStax OpsCenter which
can now support the management of Cassandra clusters with more than
1,000 nodes with increased security.

Can you give us a deepdive into the
tech?

Apache Spark is a data
analytics framework that builds on top of the Hadoop
Distributed File System. It’s an in-memory + disk

data processing engine that runs much faster
than Hadoop’s MapReduce, thereby improving query response times for
faster decisions. It’s built to run analytics queries across
shared-nothing, distributed IT environments. This makes it a great
fit for
DataStax as it is designed to
take the same approach as Cassandra from a scale-out architecture
perspective.

Combining Spark and Cassandra together means
that developers can build their applications to take advantage of
in-memory support for both their transactions and the analytics on
those transactions as well.

DataStax is contributing much of
what we’ve done with Spark and Cassandra back to those open source
communities. At the Spark Summit this week, we’re announcing that
we’re contributing our connectivity layer, datatype mapping, and
performance optimizations to open source.

There are certain things, however, that we have
developed that will remain in the commercial version of

DataStax Enterprise, like support for high
availability clustering for analytic workloads, production
certification for both Cassandra + Spark, and easy visual
management of Cassandra-Spark clusters.

We also have new partnerships with Hortonworks
and Cloudera, along with integration now between their Hadoop
platforms and
DataStaxEnterprise. This
allows customers to link together their operational

data on Cassandra/DSE with historical
information stored in Cloudera or Hortonworks

data warehouses.

On the management side, we have now scaled our
OpsCenter visual management tool so that it can manage clusters up
to 1,000 nodes. In addition, we’re including built-in security that
controls what administrators can do on clusters.

We have also added a new Performance Service to
simplify operations. The service makes it easier for customers to
see how well their Cassandra clusters are performing, and where any
potential bottlenecks may develop based on provided diagnostic
information. This includes built-in performance objects,
enforcement of best practices, recommendations for setup and
tracing of worst running statements.

What’s your definition of ‘hot’
data?

Hot data is created by
customer actions on the company’s applications or services – e.g. a
new transaction or a new web click that takes place. As each
consumer completes an activity on the application, all this

data gets processed. At the same time, this
information can be used for near real-time analytics. By delivering
transactional and analytics
data on the
same database platform (with in-memory capabilities for
both),
DataStax enables companies to
provide better services and scale faster too.

What are the biggest advantages in partnering
up with Hadoop?

This is in direct response to customer requests.
We have customers that use Hadoop for their historical

data, and they wanted a simple and
high-performance way to perform analytics across both their
operational
data held in Cassandra tables
alongside that which is held, for example, in a Hadoop Hive table.
With these partnerships, they will be able to do that in the way
that best suits them.

You said before that some of the challenges of
integrating Spark into DataStax software are high availability and
security considerations. How have these been tackled?

DSE 4.5 will include a new high availability
feature that will ensure no downtime for analytics workloads that
utilize Spark in DSE. From a security perspective, we’re
investigating adding external security support for Spark in an
upcoming DSE release.

What sector do your biggest business clients
operate in, and have you seen this evolve?

As a reminder, DataStax
serves more than 500 customers, including 25 percent of the
Fortune 100 – and these range from industries such as web 2.0 to
healthcare, government, finance, retail and manufacturing. So we
serve a broad spectrum of clients. However, our five key use cases
revolve around Internet of Things, Recommendations, Playlists,
Fraud Detection and Messaging.

What sort of considerations for business did
you take into account when building this software?

First, in our modern connected world,
applications must never go down. A business can’t afford for it’s
revenue stream to take a breather or go offline, so we provide the
most highly available database on the market. Second, applications
must perform very fast since slow is the same as off and customers
will leave a lagging website or app immediately. Third, we work to
make using Cassandra and a world-class experience that is easy to
use and doesn’t require a huge operational overhead. Many of our
largest customers such as Netflix keep their deployments running
with only a few IT team members.

Who are the biggest enterprise rivals for
DataStax?

Open source Cassandra is currently our biggest
competition. But we provide such enterprise value with DSE that we
are seeing fast adoption of the value-added features we provide
such as in memory computing, security, etc. Secondly, we compete
with Oracle who still holds the lion’s share of traction in this
marketplace, but is currently not equipped to handle the changing
demands of today’s always-on world.

What can we expect in future versions of this
software?

For the future, we’re focused on four things:
(1) Providing an enterprise ready and production certified version
of Cassandra for enterprise applications; (2) Enabling first class
enterprise manageability for all database operations; (3) Supplying
developers with everything they need to quickly create modern Web
and mobile applications on Cassandra; (4) Continually supply
features needed to support our key use cases.


Author
Comments
comments powered by Disqus