Spotlight on: Basho
CTO Justin Sheehy unlocks the Riak development story to date
Contrary to popular
misconception, ‘NoSQL’ does not stand for ‘No SQL’. It’s actually
shorthand for ‘Not only SQL’, in reference to the fact that it goes
beyond the limits of what traditional ‘Relational Databases’ can
do. Although many still see NoSQL as a buzzword, increasingly,
foresighted developers are stepping back to consider a full range
of data storage solutions right at the start of the project. This
summer, JAXmag sat with the CTO of Basho Technologies, Justin
Sheehy, to find out more about the company’s flagship NoSQL
database Riak, and nail down a proper definition for the
JAX: What’s the background story of Basho
Sheehy: We’ve been around since
January 2008. We’ve only had offices in the UK for a little less
than half our history. We didn’t really expect early on to be a
global company so early, to push out and have offices into other
places. We have an office in London and we have an office in Tokyo,
as well as our US offices. That’s been very much demand-driven. We
found that we were getting very differently shaped in terms of
Europe and UK on one hand, and Japan on the other. But in both
cases, we were getting such interesting, complex inbound demand
that we found it really necessary to go be closer to people.
Open source is a big thing for Basho. We saw
you recently open source Riak CS, and Riak has been open source for
a lot longer. Is open source the main driver for adoption for
I think so much more of technology adoption right now,
including in large enterprises, is really developer-led. The
technology that has become critical in a lot of these companies is
not decided in the first place by the CIO. It might be vetoed by
them, but they are not the ones pulling things in. Being present in
the minds and attention of developers is something that’s
essential, and open source is a fundamental part of that game.
JAXmag: You started as a CRM company – what
drove you to develop Riak?
Not quite. We were producing software as a service
early on, not so much CRM but integrated with one. That’s why it’s
easier to say CRM. But at that same time, those of us who had been
on the technical side from the beginning of the company had been
trying to figure out how to solve availability problems that
weren’t just ours but we were seeing become a huge deal across the
industry, where more and more businesses are doing things as what I
think of as “the web model of business”.
Even if they’re not website businesses or
consumer-driven, they’re moving towards that. Even enterprises that
providing things for their own employees for instance have a new
expectation that things are always present, that things are always
fast enough, that things are always as good as the web. It’s
changed what you can and cannot do, even in internal enterprise
software, and that makes availability and scalability have a much
bigger priority than they did for a lot of people developing
software just a few years ago.
That’s been the core priority of Riak ever since,
we’ve built things up and around that.
In terms of Riak’s design goals, availability,
scalability and predictability – have those always been core
priorities, and have they changed over time?
Those goals have always been so much so our primary
goals that we’ve been slower than we might have to do other things.
We will look at almost any interesting change to either product,
right now we have Riak and Riak CS, and if there’s a tradeoff to be
made, we’ll almost always make it in favour of those principles –
availability, predictability, scalability. So there are some
features which might be appealing to some developers that we either
don’t do or take longer to figure out how to do, because we don’t
want to compromise. That’s what is actually happening now.
For instance, the latest version of Riak, 1.4, [we]
are adding more features around indexing and more features around
new data structures you can use. One of the things early on in Riak
was that the data modelling was very very simple and that’s the
generous way of putting it. Many developers would find it a little
bit spartan in terms of what kind of data modelling is natively
easily available to you. Many people figured out how to do
extremely powerful things with it but they had to figure it out.
That’s because the traditional way of making it easy for developer,
the way we would have implemented it, would have compromised those
We’ve now found new ways, we’re working with leading
researchers on ways to add richer data types, more ability to do
things feel like strong consistency or having data structures like
sets maps and counters that don’t compromise our availability and
I don’t want to lump
everybody under the NoSQL term, as they are all so different. But
is making it easier for new customers to adapt to a different type
of emerging database a big challenge?
The ways in which people are making those compromises
are very different. There are things that fall under that weird
umbrella of NoSQL. But there are other systems that have made very
different compromises, that went for exactly the things that we are
only starting to get good at now.
In some ways everyone has these challenges, but based
on the priorities you’ve built your system around in the first
place, which ones you have to tackle differ. I happen to think that
by focusing on such foundational things first, we have an easier
time building things like better data structures than someone that
has built those things, and then later has to figure out how
to build a foundation underneath.
But isn’t it very
difficult to balance what you want and what the customer
Absolutely – we’ve been very customer-driven
throughout our existence. I often get asked what seems like a
natural and silly question; ‘What verticals do you have?’. Like
what verticals is Oracle. We end up in situations where it’s
critical data, which is very different to the amorphous big data
notion, which is all about piling your stuff up and it turns into a
If your data stops moving and you stop being able to
interact with it, your business stops. That’s usually not your
biggest dataset, like for example sessions in a shopping cart or
your patient records. The Danish Health Service uses Riak, and
there’s going to be a lot more companies that do that sort of
So a the situation which
Riak thrives in is when you have to deal with critical
Yes. That’s exactly what Riak’s heritage has been all
about. Almost everything bends in favour of making sure that when
someone needs to interact and modify data in Riak, it succeeds.
Payment processing, we have customers who do that sort
of thing, or whether this is for the sessions that drive your whole
web-driven business, like game platform customers where every
session interaction uses Riak. Not all of their data uses Riak, but
those interactions you have to do periodically to do the level of
what you are doing.
Can you give us an example of
So in Denmark, they decided a few years ago to
fundamentally modernise how patient records work, in an attempt to
improve public health. They built a system based on Riak in
multiple datacenters, storing all non-clinical patient data of
every citizen or resident of the country – all their pharmacy
prescriptions, all their physician visits, and all their
emergency room visits.
This means that if you travel to a part of the country
you don’t live in, go to the emergency room and you’re not capable
of describing yourself, people can see what you are allergic to.
That’s the kind of data that’s really fundamental to people’s
health and safety. If you built this in a traditional central
model, there would only be people in that hospital or region that
could really practically do this. This is a few million people.
There’s a couple of other countries we are working
with, which I can’t talk about yet. This one has turned into a
proving ground and two other countries have gone to them and said
“How did you build this? We want to do the same thing or something
related.” That’s working out nicely from our point of view as we’ll
probably doing this sort of thing for three instead of one.
What were the reasons for
choosing Erlang at the beginning of Riak’s
Riak is coded in Erlang, but our customers never
really have to care about that. For us, it’s been a strategic, not
so secret weapon. The most reliable software in the world is
written in Erlang. That’s something that can be stood behind.
Erlang was designed to be a high level rapid development language
that eased building systems that never stopped like phone switches.
If you look at the primary goals we mentioned earlier, those are
So for us, it was a straightforward technical choice.
I get asked a lot, and I considered heavily when a couple of us
made this decision early on, the tradeoff there that we thought
we’d probably make is a loss of people ready to develop on this.
It’s certainly a lot smaller language in terms of developers that
know it. Not tiny – there’s a reasonably vibrant community- but
it’s not like Java or something. I could yell out on the street
that I’m hiring Java programmers and people will come running to
the front door.
We’re going to choose something where its goals were
so aligned with ours and come with batteries included for building
things like this. Some of the things that come in the Erlang
virtual machine and the Erlang standard libraries are perfectly
suited for building Riak. We figured the only trade-off is it would
be harder to acquire talent….but it’s actually been kind of a
hiring magnet for us.
It’s not that there’s this huge hidden pool of Erlang
talent that we didn’t know, that’s not the case. But even
developers that hadn’t written any Erlang before have come to us
and were like, “All we care about is being good at distributed
systems, good at databases. And being a good enough programmer that
learning another language is off putting. I think every single one
of our engineers has used at least two programming languages in
earnest. Not always including Erlang before they start. A couple of
ours were mostly in the Java world and have turned out
fantastically for us.
So we started having see it as not nearly the hardest
of the skills and experience that we push for. Because we’re in
such an arcane area – distributed systems and databases – that side
is more important. And the people that want to build things like
Riak with us look at the fact we made that choice and like the
reasons we made it for. So the focus is, “I like that you’ve
chosen the best tool here instead of the lowest common denominator
tool that you could have used”. It shows that we have a different
decision making approach, and it shows that the kinds of people
that we want building our software appreciate it.
How do you see the NoSQL
market in general, on a personal level?
I don’t think that there is a NoSQL market, but there
is a NoSQL movement. The ‘No’ is interesting, but the ‘SQL’ isn’t.
We ended up in this weird monoculture for a couple of decades. For
the first couple of decades that databases existed, so roughly the
sixties through the eighties, there were lots of kinds of databases
that were architecturally different, and then there was this
massive consolidation to things like Oracle – all the same thing.
Just platforms. And I’m not just talking about SQL here – [I mean]
the whole architectural model – all the fundamental choices and
People could only make interesting choices below and
above them. You’d choose things like your hardware and your
operating systems, or languages and frameworks. It didn’t even
occur to people that they could make choices. It got to the point
that, because of this monoculture, database textbooks described
that kind of databases, just because that covered databases in
general. People were getting trained on the fact that there just
wasn’t an interesting choice.
There’s a whole generation of software developers that
came up not actually understanding that there was another way of
doing things – there was an architectural choice to be made there.
I think that people got trained away from that idea.
And it’s funny, there’s a few foundational things, for
a few big companies, that kicked off the NoSQL movement, which is
really, as I see it, a movement against that monoculture. It’s not
against a particular choice – it’s about the idea that you can make
all the same choices the same [time]. The interesting thing about
Amazon’s Dynamo paper – I credit it with kicking off NoSQL, but not
because of the technology, I think that they made a bunch of cool
choices, and in fact were influenced by a number of them, but to
me, the big deal about that paper is that Amazon – a company
extremely well known for not wasting money – for being very, very
diligent in that way, thought that it was worth it to have a
company that wasn’t retail led. This was pre-Amazon Web
And this was for the retail site, to put a top team of
their developers, to write a database, instead of writing code for
their retail site. That, to me, said something huge about the
market. This said that a big retailer, at an ordinary company, was
not being served by the database market. They would much rather
have brought this from a vendor, but they couldn’t, because of that
monoculture. All the databases essentially worked the same, and
they needed one different – and that opened the door for other
companies to think about making different choices. And that’s what
I think created that big shift.
Where do you fit in among the other big
database players like MongoDB?
MongoDB is an example of one database that made a very
different set of early priorities, which have also paid off very
differently, and very well for them. We’re all about critical data
– that’s what we are all about. I don’t think that’s MongoDB’s
focus. We’re all about this wildly great availability,
scalability, being there for the data that absolutely can’t
They’ve succeeded incredibly well from the beginning
at straight forward developer appeal. Very pleasing, very easy to
use, and so they’re doing very well in terms of adoption, and in
terms of educating people that there’s more than one way to think
about databases. I see them as serving a very complementary role to
us in this way. Neither one of us would be well served by trying to
go after the other’s market – we’re doing very well and making
money and stuff at the moment, but neither of us has a market worth
chasing at the moment. I’m glad that they are serving an additional
role in this whole movement.
The big challenge is to
get relational customers, using things like Oracle for example, to
move across, and to accept this new culture. Is your goal to chip
away at this competition?
Yes – and it’s not so much a negative thing from
Oracle’s point of view, but the thing is, they are dominant in the
market we’re present in, so that’s where much of the territory is
going to come from. One thing we see is that a lot of the places
that are feeling the kind of pay that would drive them to Riak have
already essentially stopped using the relational features of any
relational database. And that’s a really useful qualification for
us – because we go in and see the people are doing all kinds of
rich, huge schemes with wildly interesting relational queries that
are very ad hoc and they’re happy about that, we often don’t chase
them. We try hard to go after things that
Well it’s a completely different set of users, but
there’s absolutely a relationship between the two. We did not early
on plan to release Riak CS. We were approached by a couple of very
large service provides saying, “We’re trying to get into this cloud
business – we’re trying to build these public cloud environments –
and in this area of particularly object storage – we’re not seeing
anything that is satisfying our requirements”. And they came to us
for advice, because they were already using a database.
We looked at ourselves and said, well, there’s this
pent up demand for cloud style object storage, and the hard parts
of that are actually solved by Riak – we’re already eighty percent
there in terms of what we knew we were building. So we went six
months from those conversations to delivering profits by building
Riak CS on top of Riak.
What challenges does
I think that one of them is that we have so many
options for what to do next product wise. Riak CS proved to us that
Riak isn’t just a great database for solving customer problems, but
as a distributed data platform, for building other interesting,
rich infrastructure products like Riak CS. So one of the challenges
was figuring out which ones are next. Riak CS will not be the last
product on top of Riak. We’ve seen the power of being able to
deliver things on top of Riak like that. So that’s one of the big