Ten questions on Cassandra
Originally developed by Facebook to power its Inbox feature, OSS Cassandra has gone on to become one of the most well known databases on the market. Having just celebrated its fifth birthday, we thought it was high time to give it a spotlight in the May edition of JAX Magazine. Here, Jonathan Ellis –Apache Cassandra project chair and CTO at DataStax (vendors of Cassandra based DataStax Enterprise) talks the pivotal moments, present challenges, and the future of the technology.
JAX: What have been the most pivotal developments for Cassandra over the past five years?
Ellis: A lot has changed as Cassandra expanded from something used by a few social media companies to being deployed at thousands of companies. We've added indexes, compression, security features, virtual nodes, and more. But I think the two most noteworthy are the Cassandra Query Language (CQL) and Lightweight Transactions. CQL dramatically improves the Cassandra learning curve and meets relational developers halfway, as it were. Lightweight Transactions allow you to perform linearizable updates. Similar to relational transactions, Cassandra will guarantee that nobody will interfere with your work -- but without the heavy locks that sap your performance in the relational world.
I'm especially proud of Lightweight Transactions because open source has a reputation of being an imitator rather an innovator, and Cassandra is the first system anywhere to deliver this.
At the nontechnical level, I would immodestly cite DataStax as an important factor in Cassandra's growth. Since I founded the company four years ago, DataStax has consistently contributed generously to Cassandra development and community, and employs more Cassandra committers than any other organization.
What have been the biggest challenges issues for the technology?
I think the biggest challenge for Cassandra was how different it was to use until CQL was added 18 months ago. Sysadmins loved it because it never went down, but the development API really was hard to love. CQL gives you INSERT, SELECT, CREATE INDEX, ... it's really night and day easier to use.
What do you still want to achieve with Cassandra?
I think right now Cassandra is still crossing the technology chasm between early adopters and the mainstream market. Over 25 of the Fortune 100 use Cassandra in mission-critical applications because Cassandra solves problems nobody else can, but they're still cautious. It's new territory for them. I want to expand the ecosystem of tools and resources and education until Cassandra is as ubiquitous as Oracle.
How have you seen the community grow, and which areas do you still want to penetrate?
The community has expanded globally, specifically in Japan, Russia and EMEA. Each of these locations hosted a Cassandra Summit last year with excellent turnouts, including more than 1,400 users at our Summit in San Francisco - with more expected to attend this year. I want to continue growing international communities wherever there is an interest inCassandra, and expose new areas to the technology.
To date Cassandra has not paid enough attention to the Windows community -- while Cassandra runs on Windows and DataStax provides a Windows installer, it's clearly been a second-class citizen. It's easy for open-source developers to forget how large the Windows market is, just because of the echo chamber they live in; historically, Microsoft hasn't been very comfortable with open source, and vice versa. But that's changing, and we expect to make Windows a first-class platform for Cassandra with our 3.0 release in Q4 this year.
Do you think Cassandra will ever be big enough to pose a significant challenge to MongoDB?
I look at MongoDB in terms of two questions: Can Cassandra be as easy to use as MongoDB? And, can MongoDB ever compete with Cassandra at scale?
I think CQL has largely closed the ease-of-use gap with MongoDB. With the introduction of user defined types, Cassandra gives you the flexibility of a document design while also delivering the benefits of a typed schema that enterprises expect.
On the other hand, MongoDB's scaling and performance problems stem from fundamental architecture choices and will not be easily changed. We see a steady stream of migration from MongoDB to Cassandra like Rekko's recently when they hit that wall.
What makes Cassandra stand out in the increasingly crowded NoSQL space?
Cassandra’s modern architecture makes it massively scalable, always available and easy to deploy. Cassandra enable Internet companies to deliver engaging customer experiences with recommendations, personalized content and fraud detection. Other NoSQL vendors hit the ceiling when scaling linearly and across geographies. Cassandra is infinitely scalable without the limitations of a shared architecture as with MongoDB and the complexities of HBase and Couchbase. In addition, Cassandra’s peer-to-peer architecture, unlike other NoSQL vendors, ensures 100% uptime while supporting workloads of any size.
Do you think Cassandra would have matured differently without the input of Apache?
The Apache Software Foundation is a trusted source of infrastructure software with years of experience helping people collaborate across the world, so Cassandra has certainly benefited from the involvement of the ASF.
Why do you think the growth of Cassandra has been so explosive?
There has been widespread acceptance of open source technology in enterprises, so that has helped drive adoption. Cassandra in particular is growing fast because it is maturing rapidly and the product simply works as advertised. Cassandra enables users to accomplish things that no other technology can. So it's a secret sauce that holds the key to groundbreaking advancements.
Do you think there‘s a shortage of devs with Cassandra skills?
Definitely. Indeed.com has displayed major growth in Cassandra jobs, but there is a shortage of talent. We need to help the community by emphasizing the importance of education in relational databases and Cassandra open source skills. For developers out there - if you want amazing career opportunities over the next 5-10 years, learn Cassandra.
What do you predict for the next five years of Cassandra?
Some things we'll probably see include expanding our support for Triggers into more general server-side code integration, support for explicitly archiving infrequently-used parts of tables, and optimizations for specific workloads such as time series data. We also expect to incorporate recent computer science research such as EPaxos and RAMP transactions to improve performance and provide new features.
What's most important, however, is the way Cassandra users will change the world over the next five years. Cassandra projects are embedded at leading enterprises that are fundamentally changing how people live - and this is where Cassandra will truly make its mark. It's all about the users implementing this revolutionary technology.
Jonathan Ellis: DataStax Chief Technology Officer & Co-Founder (@spyced)
As Chief Technology Officer and co-founder at DataStax, Jonathan sets the technical direction for the company and leads Apache Cassandra as project chair. Prior to DataStax, Jonathan led Rackspace's Cassandra team and built the Cassandra community into an open source success. Previously, Jonathan built an object storage system based on Reed-Solomon encoding for data backup provider Mozy that scaled to petabytes of data and gigabits per second throughput.