Throwing out sparks
DataStax Enterprise 4.5 out of the box
DataStax, the enterprise flavoured edition of the Apache Cassandra distributed database management system, have just dropped version 4.5 of Datastax Enterprise. Announced at today’s Spark Summit 2014, this release is tuned to make deployment and operation easier, as well as offering new integrated analytics functionalities that give users a scalable and high-performance way to quickly analyze mushrooming data sets. We spoke to Robin Schumacher, VP of Products, DataStax, for the full story.
JAX: What are the most significant upgrades to DataStax Enterprise 4.5 in your opinion?
Schumacher: DataStax is the first highly scalable NoSQL provider to provide lightning fast analytics using Spark for real time data, and this integration enables up to 100X faster analytics. The other key additions are integration with external Hadoop installations, automated diagnostic and performance tuning with a new Performance Service, and expanded visual management via DataStax OpsCenter which can now support the management of Cassandra clusters with more than 1,000 nodes with increased security.
Can you give us a deepdive into the tech?
Apache Spark is a data analytics framework that builds on top of the Hadoop Distributed File System. It’s an in-memory + disk data processing engine that runs much faster than Hadoop’s MapReduce, thereby improving query response times for faster decisions. It’s built to run analytics queries across shared-nothing, distributed IT environments. This makes it a great fit for DataStax as it is designed to take the same approach as Cassandra from a scale-out architecture perspective.
Combining Spark and Cassandra together means that developers can build their applications to take advantage of in-memory support for both their transactions and the analytics on those transactions as well.
DataStax is contributing much of what we’ve done with Spark and Cassandra back to those open source communities. At the Spark Summit this week, we’re announcing that we’re contributing our connectivity layer, datatype mapping, and performance optimizations to open source.
There are certain things, however, that we have developed that will remain in the commercial version of DataStax Enterprise, like support for high availability clustering for analytic workloads, production certification for both Cassandra + Spark, and easy visual management of Cassandra-Spark clusters.
We also have new partnerships with Hortonworks and Cloudera, along with integration now between their Hadoop platforms and DataStaxEnterprise. This allows customers to link together their operational data on Cassandra/DSE with historical information stored in Cloudera or Hortonworks data warehouses.
On the management side, we have now scaled our OpsCenter visual management tool so that it can manage clusters up to 1,000 nodes. In addition, we’re including built-in security that controls what administrators can do on clusters.
We have also added a new Performance Service to simplify operations. The service makes it easier for customers to see how well their Cassandra clusters are performing, and where any potential bottlenecks may develop based on provided diagnostic information. This includes built-in performance objects, enforcement of best practices, recommendations for setup and tracing of worst running statements.
What’s your definition of ‘hot’ data?
Hot data is created by customer actions on the company’s applications or services – e.g. a new transaction or a new web click that takes place. As each consumer completes an activity on the application, all this data gets processed. At the same time, this information can be used for near real-time analytics. By delivering transactional and analytics data on the same database platform (with in-memory capabilities for both),DataStax enables companies to provide better services and scale faster too.
What are the biggest advantages in partnering up with Hadoop?
This is in direct response to customer requests. We have customers that use Hadoop for their historical data, and they wanted a simple and high-performance way to perform analytics across both their operational data held in Cassandra tables alongside that which is held, for example, in a Hadoop Hive table. With these partnerships, they will be able to do that in the way that best suits them.
You said before that some of the challenges of integrating Spark into DataStax software are high availability and security considerations. How have these been tackled?
DSE 4.5 will include a new high availability feature that will ensure no downtime for analytics workloads that utilize Spark in DSE. From a security perspective, we’re investigating adding external security support for Spark in an upcoming DSE release.
What sector do your biggest business clients operate in, and have you seen this evolve?
As a reminder, DataStax serves more than 500 customers, including 25 percent of the Fortune 100 - and these range from industries such as web 2.0 to healthcare, government, finance, retail and manufacturing. So we serve a broad spectrum of clients. However, our five key use cases revolve around Internet of Things, Recommendations, Playlists, Fraud Detection and Messaging.
What sort of considerations for business did you take into account when building this software?
First, in our modern connected world, applications must never go down. A business can’t afford for it’s revenue stream to take a breather or go offline, so we provide the most highly available database on the market. Second, applications must perform very fast since slow is the same as off and customers will leave a lagging website or app immediately. Third, we work to make using Cassandra and a world-class experience that is easy to use and doesn’t require a huge operational overhead. Many of our largest customers such as Netflix keep their deployments running with only a few IT team members.
Who are the biggest enterprise rivals for DataStax?
Open source Cassandra is currently our biggest competition. But we provide such enterprise value with DSE that we are seeing fast adoption of the value-added features we provide such as in memory computing, security, etc. Secondly, we compete with Oracle who still holds the lion’s share of traction in this marketplace, but is currently not equipped to handle the changing demands of today’s always-on world.
What can we expect in future versions of this software?
For the future, we’re focused on four things: (1) Providing an enterprise ready and production certified version of Cassandra for enterprise applications; (2) Enabling first class enterprise manageability for all database operations; (3) Supplying developers with everything they need to quickly create modern Web and mobile applications on Cassandra; (4) Continually supply features needed to support our key use cases.