Spotlight on: Basho
CTO Justin Sheehy unlocks the Riak development story to date
Contrary to popular misconception, ‘NoSQL’ does not stand for ‘No SQL’. It’s actually shorthand for ‘Not only SQL’, in reference to the fact that it goes beyond the limits of what traditional ‘Relational Databases’ can do. Although many still see NoSQL as a buzzword, increasingly, foresighted developers are stepping back to consider a full range of data storage solutions right at the start of the project. This summer, JAXmag sat with the CTO of Basho Technologies, Justin Sheehy, to find out more about the company’s flagship NoSQL database Riak, and nail down a proper definition for the platform.
JAX: What’s the background story of Basho Technologies?
Sheehy: We’ve been around since January 2008. We’ve only had offices in the UK for a little less than half our history. We didn’t really expect early on to be a global company so early, to push out and have offices into other places. We have an office in London and we have an office in Tokyo, as well as our US offices. That’s been very much demand-driven. We found that we were getting very differently shaped in terms of Europe and UK on one hand, and Japan on the other. But in both cases, we were getting such interesting, complex inbound demand that we found it really necessary to go be closer to people.
Open source is a big thing for Basho. We saw you recently open source Riak CS, and Riak has been open source for a lot longer. Is open source the main driver for adoption for you?
I think so much more of technology adoption right now, including in large enterprises, is really developer-led. The technology that has become critical in a lot of these companies is not decided in the first place by the CIO. It might be vetoed by them, but they are not the ones pulling things in. Being present in the minds and attention of developers is something that’s essential, and open source is a fundamental part of that game.
JAXmag: You started as a CRM company – what drove you to develop Riak?
Not quite. We were producing software as a service early on, not so much CRM but integrated with one. That’s why it’s easier to say CRM. But at that same time, those of us who had been on the technical side from the beginning of the company had been trying to figure out how to solve availability problems that weren’t just ours but we were seeing become a huge deal across the industry, where more and more businesses are doing things as what I think of as “the web model of business”.
Even if they’re not website businesses or consumer-driven, they’re moving towards that. Even enterprises that providing things for their own employees for instance have a new expectation that things are always present, that things are always fast enough, that things are always as good as the web. It’s changed what you can and cannot do, even in internal enterprise software, and that makes availability and scalability have a much bigger priority than they did for a lot of people developing software just a few years ago.
That’s been the core priority of Riak ever since, we’ve built things up and around that.
In terms of Riak’s design goals, availability, scalability and predictability – have those always been core priorities, and have they changed over time?
Those goals have always been so much so our primary goals that we’ve been slower than we might have to do other things. We will look at almost any interesting change to either product, right now we have Riak and Riak CS, and if there’s a tradeoff to be made, we’ll almost always make it in favour of those principles – availability, predictability, scalability. So there are some features which might be appealing to some developers that we either don’t do or take longer to figure out how to do, because we don’t want to compromise. That’s what is actually happening now.
For instance, the latest version of Riak, 1.4, [we] are adding more features around indexing and more features around new data structures you can use. One of the things early on in Riak was that the data modelling was very very simple and that’s the generous way of putting it. Many developers would find it a little bit spartan in terms of what kind of data modelling is natively easily available to you. Many people figured out how to do extremely powerful things with it but they had to figure it out. That’s because the traditional way of making it easy for developer, the way we would have implemented it, would have compromised those core principles.
We’ve now found new ways, we’re working with leading researchers on ways to add richer data types, more ability to do things feel like strong consistency or having data structures like sets maps and counters that don’t compromise our availability and scalability story.
I don’t want to lump everybody under the NoSQL term, as they are all so different. But is making it easier for new customers to adapt to a different type of emerging database a big challenge?
The ways in which people are making those compromises are very different. There are things that fall under that weird umbrella of NoSQL. But there are other systems that have made very different compromises, that went for exactly the things that we are only starting to get good at now.
In some ways everyone has these challenges, but based on the priorities you’ve built your system around in the first place, which ones you have to tackle differ. I happen to think that by focusing on such foundational things first, we have an easier time building things like better data structures than someone that has built those things, and then later has to figure out how to build a foundation underneath.
But isn’t it very difficult to balance what you want and what the customer wants?
Absolutely – we’ve been very customer-driven throughout our existence. I often get asked what seems like a natural and silly question; ‘What verticals do you have?’. Like what verticals is Oracle. We end up in situations where it’s critical data, which is very different to the amorphous big data notion, which is all about piling your stuff up and it turns into a pony.
If your data stops moving and you stop being able to interact with it, your business stops. That’s usually not your biggest dataset, like for example sessions in a shopping cart or your patient records. The Danish Health Service uses Riak, and there’s going to be a lot more companies that do that sort of thing.
So a the situation which Riak thrives in is when you have to deal with critical data?
Yes. That’s exactly what Riak’s heritage has been all about. Almost everything bends in favour of making sure that when someone needs to interact and modify data in Riak, it succeeds.
Payment processing, we have customers who do that sort of thing, or whether this is for the sessions that drive your whole web-driven business, like game platform customers where every session interaction uses Riak. Not all of their data uses Riak, but those interactions you have to do periodically to do the level of what you are doing.
Can you give us an example of this?
So in Denmark, they decided a few years ago to fundamentally modernise how patient records work, in an attempt to improve public health. They built a system based on Riak in multiple datacenters, storing all non-clinical patient data of every citizen or resident of the country – all their pharmacy prescriptions, all their physician visits, and all their emergency room visits.
This means that if you travel to a part of the country you don’t live in, go to the emergency room and you’re not capable of describing yourself, people can see what you are allergic to. That’s the kind of data that’s really fundamental to people’s health and safety. If you built this in a traditional central model, there would only be people in that hospital or region that could really practically do this. This is a few million people.
There’s a couple of other countries we are working with, which I can’t talk about yet. This one has turned into a proving ground and two other countries have gone to them and said “How did you build this? We want to do the same thing or something related.” That’s working out nicely from our point of view as we’ll probably doing this sort of thing for three instead of one.
What were the reasons for choosing Erlang at the beginning of Riak’s development?
Riak is coded in Erlang, but our customers never really have to care about that. For us, it’s been a strategic, not so secret weapon. The most reliable software in the world is written in Erlang. That’s something that can be stood behind. Erlang was designed to be a high level rapid development language that eased building systems that never stopped like phone switches. If you look at the primary goals we mentioned earlier, those are those goals.
So for us, it was a straightforward technical choice. I get asked a lot, and I considered heavily when a couple of us made this decision early on, the tradeoff there that we thought we’d probably make is a loss of people ready to develop on this. It’s certainly a lot smaller language in terms of developers that know it. Not tiny – there’s a reasonably vibrant community- but it’s not like Java or something. I could yell out on the street that I’m hiring Java programmers and people will come running to the front door.
We’re going to choose something where its goals were so aligned with ours and come with batteries included for building things like this. Some of the things that come in the Erlang virtual machine and the Erlang standard libraries are perfectly suited for building Riak. We figured the only trade-off is it would be harder to acquire talent….but it’s actually been kind of a hiring magnet for us.
It’s not that there’s this huge hidden pool of Erlang talent that we didn’t know, that’s not the case. But even developers that hadn’t written any Erlang before have come to us and were like, “All we care about is being good at distributed systems, good at databases. And being a good enough programmer that learning another language is off putting. I think every single one of our engineers has used at least two programming languages in earnest. Not always including Erlang before they start. A couple of ours were mostly in the Java world and have turned out fantastically for us.
So we started having see it as not nearly the hardest of the skills and experience that we push for. Because we’re in such an arcane area – distributed systems and databases – that side is more important. And the people that want to build things like Riak with us look at the fact we made that choice and like the reasons we made it for. So the focus is, “I like that you’ve chosen the best tool here instead of the lowest common denominator tool that you could have used”. It shows that we have a different decision making approach, and it shows that the kinds of people that we want building our software appreciate it.
How do you see the NoSQL market in general, on a personal level?
I don’t think that there is a NoSQL market, but there is a NoSQL movement. The ‘No’ is interesting, but the ‘SQL’ isn’t. We ended up in this weird monoculture for a couple of decades. For the first couple of decades that databases existed, so roughly the sixties through the eighties, there were lots of kinds of databases that were architecturally different, and then there was this massive consolidation to things like Oracle – all the same thing. Just platforms. And I’m not just talking about SQL here – [I mean] the whole architectural model – all the fundamental choices and priorities.
People could only make interesting choices below and above them. You’d choose things like your hardware and your operating systems, or languages and frameworks. It didn’t even occur to people that they could make choices. It got to the point that, because of this monoculture, database textbooks described that kind of databases, just because that covered databases in general. People were getting trained on the fact that there just wasn’t an interesting choice.
There’s a whole generation of software developers that came up not actually understanding that there was another way of doing things – there was an architectural choice to be made there. I think that people got trained away from that idea.
And it’s funny, there’s a few foundational things, for a few big companies, that kicked off the NoSQL movement, which is really, as I see it, a movement against that monoculture. It’s not against a particular choice – it’s about the idea that you can make all the same choices the same [time]. The interesting thing about Amazon’s Dynamo paper – I credit it with kicking off NoSQL, but not because of the technology, I think that they made a bunch of cool choices, and in fact were influenced by a number of them, but to me, the big deal about that paper is that Amazon – a company extremely well known for not wasting money – for being very, very diligent in that way, thought that it was worth it to have a company that wasn’t retail led. This was pre-Amazon Web Services.
And this was for the retail site, to put a top team of their developers, to write a database, instead of writing code for their retail site. That, to me, said something huge about the market. This said that a big retailer, at an ordinary company, was not being served by the database market. They would much rather have brought this from a vendor, but they couldn’t, because of that monoculture. All the databases essentially worked the same, and they needed one different – and that opened the door for other companies to think about making different choices. And that’s what I think created that big shift.
Where do you fit in among the other big database players like MongoDB?
MongoDB is an example of one database that made a very different set of early priorities, which have also paid off very differently, and very well for them. We’re all about critical data – that’s what we are all about. I don’t think that’s MongoDB’s focus. We’re all about this wildly great availability, scalability, being there for the data that absolutely can’t stop.
They’ve succeeded incredibly well from the beginning at straight forward developer appeal. Very pleasing, very easy to use, and so they’re doing very well in terms of adoption, and in terms of educating people that there’s more than one way to think about databases. I see them as serving a very complementary role to us in this way. Neither one of us would be well served by trying to go after the other’s market – we’re doing very well and making money and stuff at the moment, but neither of us has a market worth chasing at the moment. I’m glad that they are serving an additional role in this whole movement.
The big challenge is to get relational customers, using things like Oracle for example, to move across, and to accept this new culture. Is your goal to chip away at this competition?
Yes – and it’s not so much a negative thing from Oracle’s point of view, but the thing is, they are dominant in the market we’re present in, so that’s where much of the territory is going to come from. One thing we see is that a lot of the places that are feeling the kind of pay that would drive them to Riak have already essentially stopped using the relational features of any relational database. And that’s a really useful qualification for us – because we go in and see the people are doing all kinds of rich, huge schemes with wildly interesting relational queries that are very ad hoc and they’re happy about that, we often don’t chase them. We try hard to go after things that
Well it’s a completely different set of users, but there’s absolutely a relationship between the two. We did not early on plan to release Riak CS. We were approached by a couple of very large service provides saying, “We’re trying to get into this cloud business – we’re trying to build these public cloud environments – and in this area of particularly object storage – we’re not seeing anything that is satisfying our requirements”. And they came to us for advice, because they were already using a database.
We looked at ourselves and said, well, there’s this pent up demand for cloud style object storage, and the hard parts of that are actually solved by Riak – we’re already eighty percent there in terms of what we knew we were building. So we went six months from those conversations to delivering profits by building Riak CS on top of Riak.
What challenges does Basho face?
I think that one of them is that we have so many options for what to do next product wise. Riak CS proved to us that Riak isn’t just a great database for solving customer problems, but as a distributed data platform, for building other interesting, rich infrastructure products like Riak CS. So one of the challenges was figuring out which ones are next. Riak CS will not be the last product on top of Riak. We’ve seen the power of being able to deliver things on top of Riak like that. So that’s one of the big ones.