Why you shouldnt use NoSQL just for the sake of it
How new database tech could help change the way we see the world – as long as we dont screw it up.
long-time Java advocate, University of Dundee lecturer Andy Cobley
explains why he’s teaching Cassandra, the reason he hopes it
doesn’t all go horribly wrong for NoSQL, and why he still stands
behind Oracle’s platform, fifteen years down the
JAXenter: What made you start
teaching Cassandra to your students?
Cobley: At one point, teaching
undergraduates, I realised that they had SQL from first year,
second year, third year, fourth year, and their eyes are glazing
over by the time they even get to third year. It’s a four year
course in Scotland where I teach I decided it would be nice
to put something different in – let the students actually try
something on the database that wasn’t just a relational database –
exactly what they’d been doing before.
So I decided on Cassandra, which of course was
written in Java, and so was a nice fit for me. We were doing server
side Java anyway, so it was a natural progression. Some of the
students took to it quite readily – and some of them rejected it
quite violently because it’s just so different!
Those that took to it have taken to it quite
well though, and gone on to do some very interesting things in the
A lot of students on the big data course come
from the SQL world, like database administrators, either in the
Microsoft or the Oracle space. It’s nice for them to see,
“Hang on, the world isn’t the way I thought it
was, and it’s changing!”. And actually
that’s been incredibly well timed because when we started doing it
this January the SQL world had just taken off and big data world’s
Do you think NoSQL will eventually be the
default database technology?
We’re in an awkward situation – I’ve seen
technologies come and go, because I’m
quite old – and what tends to happen, and Gartner have said this,
is that technologies will rise and rise up, then there’ll be a
disaster somewhere, interest will drop off – then it’ll start
rising again, and become a plateau.
We don’t know where we are at the moment. But, I
wouldn’t be surprised to find there’s going to be some disasters.
Not because of the technology, but because of the misuse of
technology. People picking the wrong system for them, and not
picking it for the right reason.
Not so long ago, I talked to a developer in the
games industry – I won’t say which particular company – and they
were saying that they really wanted to use Riak on their next
project. I said, “Why do you want to use it?” and they replied,
“Because I haven’t used it before”. That’s not the right answer.
The right answer is, I’ve got a use case for this.
Similarly, with Spine 2, the NHS is going to
replace Oracle with Riak, and with the best will in the world, the
NHS is notorious for making a mess of projects. If it’s implemented
correctly and they’re doing it for the right reason, as opposed to
just because they want to get rid of Oracle, there’s no reason why
it shouldn’t work. If that goes wrong, then that could be a
backlash, and people will equate the technology with the decision
to use it – and that’s the wrong conclusion. The technology is
fine, as long as it’s used in the right way. And that’s the same
with relational databases.
How have you seen Cassandra grow over the past
If you look at this [Cassandra] conference, it’s
full, and at the one in San Francisco, I think there were around
1000 developers there, and I think people are just taking it up
left, right and centre, and I hope that they’re taking it up for
the right reasons. That’s the next problem!
The other worry is that people will be trying to
run it on a single server when it’s not designed for that because
that’s what they did before, so I think that the steps that
Datastax are taking to train developers are very important. To give
developers the right mindset, and the way of thinking about their
development rather than just trying to hack code together. It was
announced this week that there’s this training program that
Datastax are putting online, and it’s going to be so accessible.
You can just log on and do the training and hopefully
though the community, certainly on iOS and Twitter and what have
you, they’re gonna get the support they need to build the right
applications the right way.
When I started teaching, the only interfaces
were the Thrift interfaces and we used Hector for teaching
purposes. And the problem was, students took
one look at Hector and just screamed
because it just looked so different to anything that they’d done
before, and it required a lot more in-depth knowledge of Java than
they’d really come across, and some concepts that hadn’t dealt with
before. They were trying to learn Java and they suddenly had to use
this huge, deep end Java stuff that they hadn’t really come
I think the introduction of CQL that we’ve
brought in has kind of brought it round.
So students sort of feel at home because it
feels like they are writing SQL, they can feel like they are using
the JDBC driver driver which they are used to
using, and it allows them to get their projects written quicker and
not get stuck in a load of Java syntax.
They still have a learning curve but I think
that’s only going to increase the uptake of Cassandra. It’s really
low at the barrier for developers.
What do you think the future holds for
Well if people do it right and it doesn’t go
horribly wrong, as I said, it give developers the opportunity to
handle much bigger data sets than they previously could do. I think
people are going to start discovering that they’ve got large data
sets in places that they don’t know – for instance, I believe that
the UK government has just announced that environmental research
bodies are starting to build big data centres now. So we’re talking
mapping data, planning applications and what have you, starting to
discovers links between where you put buildings and how it affects
I think there are a lot of areas we have yet to
discover, and I think that’s where it’s going to start to get
exciting, in those sort of areas that traditionally haven’t been
exciting in this field – they’re going to start finding a lot of
interesting answers once they start finding the tools to be able to
mine that data. Bottom line, I think there’s a lot of fields out
there that have never heard of big data or NoSQL, and they’re going
to start using them. Technologies like Hadoop on top of databases
are going to lead us to discover unbelievable things in the future,
and to bring it back to Java, there’s going to be a lot of people
in Java jobs running Hadoop!
How far back do you and Java go?
Andy: I first started using Java
around 1997/1998, and very quickly I wrote one of the very first
Java books called ‘Java in Easy Steps’, and that would have been in
about 1998, so that’s how far that goes back. I teach Java on the
server side, you know JSP and servlets and all that
sort of thing, and recently, with a colleague of mine, I’ve got
into business intelligence MSC, and that brought us round to the
big data quite quickly.
People criticise Java for being a very verbose
language in comparison to these modern languages like Ruby, and I
have to rant against PHP for Java at the start of every term. With
Java, it’s verbose, it’s precise, and it’s like that for a reason.
It allows you to write code that you can verify to be true, and
because you’ve defined all your variables, you’re not going to get
a sudden typo.
Do you think it will decline in the future, as
many are predicting?
I think Java’s got a huge future. Oracle
actually seem to have done quite a reasonable job in my opinion of
picking it up when they took it over from Sun. They’re now starting
to get a proper roadmap.
I run Cassandra on Raspberry Pi, and you know,
previously people were using OpenJDK and
if you actually look at the stats running Cassandra,
OpenJDK is really slow compared to it. Now
Oracle have the native hardware it runs a lot faster. I think on
the server side, for things like Cassandra, other big applications,
Tomcat etcetera, it’s proving that the
original version of Java as a network application is starting to
I don’t think it’s quite as integrated to the
network as a programming language like Erlang, but it’s still
fairly there. But if I
understand it rightly then the original use of Java was going to be
set-top boxes for televisions, and again, if I understand things
correctly, most Blu-ray disk players have
Java built in.
So it’s basically fulfilled its
Absolutely! It started on way, came around
again. Regardless of what people say, [Java] is reliable. Obviously
there’s been scares about Java in the browser being insecure, but
that’s because you’re marrying two technologies together in some
ways – the browser technology with Java running in it.