Social data

Ten questions on Cassandra

Lucy Carey

Mapping the Cassandra journey, and looking ahead to the future.

Originally developed by Facebook to power its Inbox feature,
OSS Cassandra has gone on to become one of the most well known
databases on the market. Having just celebrated its fifth birthday,
we thought it was high time to give it a spotlight in the May
edition of JAX Magazine. Here, Jonathan Ellis –Apache Cassandra
project chair and CTO at DataStax (vendors of Cassandra based
DataStax Enterprise) talks the pivotal moments, present challenges,
and the future of the technology.

JAX: What have been the most pivotal
developments for Cassandra over the past five years?

Ellis: A lot has changed as
Cassandra expanded from something used by a few social media
companies to being deployed at thousands of companies.  We’ve
added indexes, compression, security features, virtual nodes, and
more.  But I think the two most noteworthy are the Cassandra
Query Language (CQL) and Lightweight Transactions.  CQL
dramatically improves the Cassandra learning curve and meets
relational developers halfway, as it were.  Lightweight
Transactions allow you to perform linearizable updates. Similar to
relational transactions, Cassandra will guarantee that nobody will
interfere with your work — but without the heavy locks that sap
your performance in the relational world.

I’m especially proud of Lightweight Transactions
because open source has a reputation of being an imitator rather an
innovator, and Cassandra is the first system anywhere to deliver

At the nontechnical level, I would immodestly
cite DataStax as an important factor in Cassandra’s growth. Since I
founded the company four years ago, DataStax has consistently
contributed generously to Cassandra development and community, and
employs more Cassandra committers than any other

What have been the biggest challenges issues
for the technology?

I think the biggest challenge for Cassandra was
how different it was to use until CQL was added 18 months ago.
 Sysadmins loved it because it never went down, but the
development API really was hard to love. CQL gives you INSERT,
SELECT, CREATE INDEX, … it’s really night and day easier to

What do you still want to achieve with

I think right now Cassandra is still crossing
the technology chasm between early adopters and the mainstream
market.  Over 25 of the Fortune 100 use Cassandra in
mission-critical applications because Cassandra solves problems
nobody else can, but they’re still cautious.  It’s new
territory for them.  I want to expand the ecosystem of tools
and resources and education until Cassandra is as ubiquitous as

 How have you seen the
community grow, and which areas do you still want to

The community has expanded globally,
specifically in Japan, Russia and EMEA. Each of these locations
hosted a Cassandra Summit last year with excellent turnouts,
including more than 1,400 users at our Summit in San Francisco –
with more expected to attend this year. I want to continue growing
international communities wherever there is an interest
inCassandra, and expose new areas to the technology.

To date Cassandra has not paid enough attention
to the Windows community — while Cassandra runs on Windows and
DataStax provides a Windows installer, it’s clearly been a
second-class citizen.  It’s easy for open-source developers to
forget how large the Windows market is, just because of the echo
chamber they live in; historically, Microsoft hasn’t been very
comfortable with open source, and vice versa.  But that’s
changing, and we expect to make Windows a first-class platform for
Cassandra with our 3.0 release in Q4 this year.

Do you think Cassandra will ever be big enough
to pose a significant challenge to MongoDB?

I look at MongoDB in terms of two questions: Can
Cassandra be as easy to use as MongoDB?  And, can MongoDB ever
compete with Cassandra at scale?

I think CQL has largely closed the ease-of-use
gap with MongoDB.  With the introduction of user defined
types, Cassandra gives you the flexibility of a document design
while also delivering the benefits of a typed schema that
enterprises expect.

On the other hand, MongoDB’s scaling and
performance problems stem from fundamental architecture choices and
will not be easily changed.  We see a steady stream of
migration from MongoDB to Cassandra like Rekko’s recently when they
hit that wall.

What makes Cassandra stand out in the
increasingly crowded NoSQL space?

Cassandra’s modern architecture makes it
massively scalable, always available and easy to deploy. Cassandra
enable Internet companies to deliver engaging customer experiences
with recommendations, personalized content and fraud detection.
Other NoSQL vendors hit the ceiling when scaling linearly and
across geographies. Cassandra is infinitely scalable without the
limitations of a shared architecture as with MongoDB and the
complexities of HBase and Couchbase. In addition, Cassandra’s
peer-to-peer architecture, unlike other NoSQL vendors, ensures 100%
uptime while supporting workloads of any size.

Do you think Cassandra would have matured
differently without the input of Apache?

The Apache Software Foundation is a trusted
source of infrastructure software with years of experience helping
people collaborate across the world, so Cassandra has certainly
benefited from the involvement of the ASF.

Why do you think the growth of Cassandra has
been so explosive?

There has been widespread acceptance of open
source technology in enterprises, so that has helped drive
adoption. Cassandra in particular is growing fast because it is
maturing rapidly and the product simply works as advertised.
Cassandra enables users to accomplish things that no other
technology can. So it’s a secret sauce that holds the key to
groundbreaking advancements.

Do you think there‘s a shortage of devs with
Cassandra skills?

Definitely. has displayed major
growth in Cassandra jobs, but there is a shortage of talent. We
need to help the community by emphasizing the importance of
education in relational databases and Cassandra open source skills.
For developers out there – if you want amazing career opportunities
over the next 5-10 years, learn Cassandra.

 What do you predict for the
next five years of Cassandra?

Some things we’ll probably see include expanding
our support for Triggers into more general server-side code
integration, support for explicitly archiving infrequently-used
parts of tables, and optimizations for specific workloads such as
time series data.  We also expect to incorporate recent
computer science research such as EPaxos and RAMP transactions to
improve performance and provide new features.

What’s most important, however, is the way
Cassandra users will change the world over the next five years.
Cassandra projects are embedded at leading enterprises that are
fundamentally changing how people live – and this is where
Cassandra will truly make its mark. It’s all about the users
implementing this revolutionary technology.

Jonathan Ellis: DataStax Chief Technology
Officer & Co-Founder (@spyced)

As Chief Technology Officer and co-founder at
DataStax, Jonathan sets the technical direction for the company and
leads Apache Cassandra as project chair. Prior to DataStax,
Jonathan led Rackspace’s Cassandra team and built the Cassandra
community into an open source success. Previously, Jonathan built
an object storage system based on Reed-Solomon encoding for data
backup provider Mozy that scaled to petabytes of data and gigabits
per second throughput.


comments powered by Disqus