Discussing the state of NoSQL databases with DataStax’s Billy Bosworth
From March’s issue of JAX Magazine, DataStax CEO Billy Bosworth discusses Apache Cassandra, MongoDB market share and why NoSQL no longer means ‘no security’.
Billy is responsible for the strategy, explosive growth, and day-to-day operations of DataStax. He has 20 years of experience in the database industry in roles ranging from DBA to senior executive. Prior to DataStax, Billy spent 6 years at Quest Software where his most recent role was VP and GM of the database business unit.
From March’s issue of JAX Magazine, DataStax CEO Billy Bosworth discusses Apache Cassandra, MongoDB market share – and why NoSQL no longer means ‘no security’.
JAX Magazine: Hi Billy, thanks for taking the time to speak to us. First, could you tell us a little bit about DataStax?
Billy Bosworth: So, Datastax is the commercial company behind the open source database Apache Cassandra. And the sweet spot for Cassandra is powering real-time applications that generate transactions at hyper velocity. And we deliver that power with an architecture that is simple, flexible and most importantly extremely performant once you hit big data scale.
The company was started back in April 2010, and co-founded by the then and current Apache Cassandra chairman, Jonathan Ellis.
So the links run deep between Cassandra and DataStax?
Very. And we are very committed to the success of the community. In fact, I was just looking the other day at, from a budget standpoint, people standpoint, how much we spend on direct community activity, and it’s quite a lot, and that’s very important to us. So we want to see a very thriving and growing Cassandra community independent of whether or not those customers ultimately become DataStax customers. We do a lot for the community, for the community’s sake.
DataStax Enterprise and the free DataStax Community are both separate to Cassandra. What’s superior about them?
So DataStax Community is a way you can quickly consume and get started with pure Apache Cassandra. You can think of that analogous to Fedora from Red Hat. So, you have the installers, you have a free version of our management tool, OpsCentre, you have sample applications, all within a very nice, ready-made bundle, ready for you to download and install. So basically that is just our way of helping people in the community who may not want to chase down all the individual bits from the various Apache projects, and GitHub, and the connectors, and the libraries and all that sort of thing. We want to make it just a very easy way to digest and get started with the open-source version.
Now, DataStax Enterprise is a little different. DataStax Enterprise is definitely the choice when you are ready to take your Apache Cassandra needs into a production environment. And everything we do in DataStax Enterprise is gear it towards helping that application team. Everything they need, including the confidence, and security, and reliability of a company behind you, but also to have all the functionality you need in a single platform so that you can work with your data in the context of that application without having to ETL or move it around.
The big feature of DataStax 3.0 is security. Were Cassandra and DataStax insecure before?
Uhh, yeah, pretty much all of these systems are actually! [Laughs] These NoSQL databases are problematic in that respect. It’s been definitely something that has been known, and worked around, but yes – to give you a very short answer, the ability to handle the type of security that you are accustomed to in the relational world simply has not existed yet in the NoSQL world. And so they can do it in different ways. Application guys are smart, they try as much as they can to use the other techniques and they’ll use security level at the application layer, which does add some complexity to the process. So this will now give them that same kind of trusted feel that they had with working with security in their relational databases, they’ll start to now have those capabilities directly inside of Cassandra.
Are those security improvements being pushed upstream to the Cassandra source?
They are. We’ve done a couple of things. For the Cassandra community, we have released three pretty major security features, which is basically what everybody’s familiar with – I wanna create a user, I’m gonna give that user a password, and then they’re gonna have to log in to the database with that user ID and password. Very well understood authentication model – it’s been around forever in the relational world.
And that feature didn’t exist Cassandra and DataStax up until now?
That is correct.
Is it implemented in other NoSQL databases? MongoDB or CouchDB…?
I don’t want to speak on their behalf, I’m not positive of how they do their implementation. I do not think so – I know that nobody has, as we go through this list, nobody’s gonna have the comprehensive solution that we have, but I’m actually not sure to the details and I don’t want to speak out of school.
That seems like quite an obvious oversight.
Well, it wasn’t so much an oversight as it was a design challenge. When these systems were built, you have to remember – the interesting thing about this NoSQL market is [in the early days] we had people running Cassandra in production environments, 0.4, 0.5 releases. That’s insane! The traditional enterprise applications, you would just never think about running a zero dot anything in production, right? Think about it from an application development standpoint.
But the need was so great, the technology challenge was so monumental, that they simply had to find a way to solve the problems. And so I would say it’s not so much an oversight as what you’re seeing now is a maturity. Now these things are finally coming to fruition – we’ve always know we’ve needed it. We did an article back in April of 2012, titled “Why NoSQL Equals No Security”. And he said, almost the same way you did – the intro was “it seems security is an afterthought at best in the big data ecosystem”.
It really hasn’t been an afterthought, it’s just been, as I said, a maturity thing. And now we’re at that stage where we are ready to introduce that maturity both into the open source line, and with some enterprise features into DataStax Enterprise.
So the second thing – going back to what we’re giving to the community, the second thing is what’s called ‘internal object permissions’. This means that when you create an object inside of Cassandra, now you have the ability to take that internal user authentication that you created in step one, and you can say “now I want to give Elliot read permissions on this object”, or “I want to give Billy read and write permissions on this object”. That’s now fully available inside of Apache Cassandra.
And then the third one is also very important, and that’s client-to-node encryption. The ability to encrypt that data on the fly as it moves between the Cassandra node and the end application point.
How would you describe the state of the database market?
It’s been very exciting, I can tell you, coming from the relational market for the last 20 years, this has been fascinating and fun to watch this transition. I very much liken it – being an old guy – to when I first came out of college in ’92, and I was watching the revolution of the whole what we used to call “open systems databases”, which was Oracle and DB2, and Sybase, and later SQL Server. It’s really fun, it’s like watching that happen all over again.
I think what’s happening now, in 2013 and I’d say the end of 2012, the biggest shift that I’m starting to see – that is a very good thing – is people are realising that when they say ‘big data’, that is not a one-size catch-all bucket. There are definitely different characteristics that different technologies solve very well, and people are starting to understand those nuances a little bit better, which is great. So, Hadoop’s been around for quite a long time now actually, if you think about Google releasing their white papers back in 2003. And I would say now, the NoSQL movement is catching up to the mainstream mindset of people, as they think about big data. And they’re starting to rightfully now ask: “OK, wait a minute. Are you talking about big data analytics, which would be the Hadoop data warehouse world, or are you talking about big data transactions, which would be like the more classic Oracle type of world.”
And that is an important distinction that’s finally starting to catch hold, and a lot of us have been out there trying to educate the market on that. So just understanding that nuance is a good thing. It’s also a very crowded market, and getting more crowded, and what I tell people is that as they look at those graphs and charts that try and capture all the different players, one of the things I’d say is, if you really want to understand how they’re doing, go get ten documented use cases. I mean really documented, I don’t mean top X this or top leading that, go find ten use cases with companies and customers willing to talk in depth about what they’re doing with that technology.
My personal take is, if you can’t find ten for a given technology, skip it for now. Because I don’t know if it’s gonna make it or not – there’s just too much noise out there in the marketing side. What you really need to do is get under the covers and figure out who is using this stuff.
And that’s why I’m so proud of us. We can show you dozens of DataStax in-depth customers with names that everyone knows and understands, people like Netflix and Adobe and eBay and healthcare companies. And then we have hundreds more on the Cassandra side, if you go over to our community site called Planet Cassandra. So people right now, they’re getting so glassy-eyed over the marketing, and that is a great way to cut through the marketing. Get to the use cases.
10gen recently told us that they expect MongoDB to take 80% of the database market. How would you respond to that?
[Laughs] Let’s talk about use cases and customers and see how people are using this in production. That’s how I’d respond. I find that stuff… interesting.
It was a bold claim.
What does it cost you to make a claim? Nothing. What does it cost you to get customers lined up to talk about how they’re using your technology on an enterprise scale? A lot. You better be real, you better be doing it. By the way, I think they’re going to be great, I really do. I think they’re going to be a great company. But the use cases are different. I think about, why do we have such diversity in the real world, right? Why was MySQL popular after Oracle, and why do we have SQL Server and Oracle, and why – I’ve been around this business way too long to give serious credence to claims like that. I want to see use cases, I want to see things done in real life, I want to talk facts. I’d rather talk about what’s happening rather than what’s going to happen. Talking about what’s going to happen is fun and easy. Talking about what’s happening is hard and real.
So, relational databases – they’re still going to be around in 20, 30 years’ time, right?
I completely believe they will. I absolutely believe they will. These things have such long tails, my goodness. I can remember, again going back to ’92, developing this stuff – the claims that would happen, it seemed like every month, about the mainframe was gonna be dead and there was going to be no mainframe by the year 1995. And then there was going to be no mainframe after Y2K, because it was all going to be rewritten, and there’s going to be no mainframe – I dunno, I haven’t checked in a while, but the last time I looked, mainframe sales were flat for like 20 years? This stuff has a long tail. It is not easy to just say that a market that size just goes away. That is a pretty unrealistic way to think about things, number one; number two, there are still very good use cases for it. Very good use cases. And this is all about helping people find the right use cases for the right problems.
That’s why we exist. We want to be credible and trustworthy. What we say, we want people to be able to go and verify, and get help and understanding and deliver a solution that they can put into production. And what we’re seeing is, when people do that, often relational technology sits right alongside these other [modern] technologies. And I know my friend Mike Olson [spelling] over at Cloudera, he says the same thing about Hadoop. He says, in the majority of cases I see, these technologies are living in an ecosystem with these traditional technologies, and I echo that, I see that exact same thing.
So, no, maybe I’ve just been around too long, and maybe I’ve been around the relational world for too long, but I just know these markets have very, very long tails.
Finally, when can we drop this NoSQL label and just talk about databases?
[Laughs] Man, I wish I could do that tomorrow. I’ve never liked it, I just – philosophically, I don’t like describing something by what it’s not. I think that’s a very poor way to describe anything in life, number one; number two, it’s actually inaccurate. There’s no reason that you can’t use SQL-like language, at least for parts of what you do, and in fact that’s precisely what we do with the language we have called CQL, which is the Cassandra Query Language. And the Cassandra Query Language is a subset of SQL. If you know SQL, you’re gonna look at CQL and go “Oh yeah, of course I get that. SELECT name FROM employee WHERE…
So, there was a time when it was very hard to categories this sort of stuff. None of us knew what to call them, and it just stuck, and I guess it’s going to stick with us for some time to come because we are really just now starting across the chasm, and when you get into that more mainstream mindset that they do look for nice ways to easily classify something and easily name something, and this name I think is going to be around for some time, actually. I would have much preferred – I loved the term ‘flexible schema’ actually. That, to me, makes the most sense, but that’s coming from an old, relational mind who loves the idea of not being beholden to that schema once I create it. And I love the idea of having a schema that can change from entry to entry. But, we’re stuck with it. My guess is it’s going to be with us for a while.
It’s a blessing and a curse, right? Because it’s a very useful marketing buzzword.
It is, and y’know it does help people understand categorically what you’re talking about when you say it. And then you have to start breaking down the nuances even further. The biggest thing I think is going to happen with all these databases is, it really is currently – and I think will remain – a world that is heterogeneous on the backend of these data stores. And what I mean by that is, your [JAX Magazine's] audience, your crowd, is getting very, very good at building services layers that are flexible, and that route the right workloads to the right databases.
So here’s the challenge that you have, and that a lot of people in the marketplace have: you guys may read the use case, you may go read our eBay use case, and then you’ll turn around and read an eBay use case about Teradata. And then you’ll turn around and read an eBay use case about, I dunno, PostgreSQL. And then you’ll start reading these same logos, and you’ll start seeing all this other technology. And you’re like, “wait a minute, I though they just ran this? I thought they ran that?” And it’s no longer an either/or world.
Back in the day we had an application, that application talked to a database, and you picked a database. And when you wanted to move that data somewhere else you ETL’d it. That’s just not what’s happening now. Even in my smaller customers, I see services layers being built – very sophisticated services layers being built – that will route the data to the proper technology. So it’s not uncommon at all, in the same application, to see a workload going to a relational database, a workload going to DataStax, a workload going to Mongo[DB], a workload going to MySQL – in the same application. And so, back to the [10gen] comment about market share, how do you claim that? Does that mean “we are that database”?
It’s just going to be a much more complex world in that sense, but I think it’s going to be a better world for the application architects. Because now they do have this true ability to go find the right technology for the right piece of the application stack. And I think that’s wonderful, I would have loved to have that flexibility, back when I was developing in the 90s, that would have been fantastic. So I think it’s a very exciting time right now for that reason.
Thanks for talking to us, Billy!
This interview appeared in JAX Magazine: Pulling Together. For that issue and others, click here.
Latte image courtesy of yukop