Why you shouldnt use NoSQL just for the sake of it
How new database tech could help change the way we see the world – as long as we dont screw it up.
A long-time Java advocate, University of Dundee lecturer Andy Cobley explains why he’s teaching Cassandra, the reason he hopes it doesn’t all go horribly wrong for NoSQL, and why he still stands behind Oracle’s platform, fifteen years down the line.
JAXenter: What made you start teaching Cassandra to your students?
Cobley: At one point, teaching undergraduates, I realised that they had SQL from first year, second year, third year, fourth year, and their eyes are glazing over by the time they even get to third year. It’s a four year course in Scotland where I teach I decided it would be nice to put something different in – let the students actually try something on the database that wasn’t just a relational database – exactly what they’d been doing before.
So I decided on Cassandra, which of course was written in Java, and so was a nice fit for me. We were doing server side Java anyway, so it was a natural progression. Some of the students took to it quite readily – and some of them rejected it quite violently because it’s just so different!
Those that took to it have taken to it quite well though, and gone on to do some very interesting things in the Cassandra community.
A lot of students on the big data course come from the SQL world, like database administrators, either in the Microsoft or the Oracle space. It’s nice for them to see, “Hang on, the world isn’t the way I thought it was, and it’s changing!”. And actually that’s been incredibly well timed because when we started doing it this January the SQL world had just taken off and big data world’s taken off.
Do you think NoSQL will eventually be the default database technology?
We’re in an awkward situation – I’ve seen technologies come and go, because I’m quite old – and what tends to happen, and Gartner have said this, is that technologies will rise and rise up, then there’ll be a disaster somewhere, interest will drop off – then it’ll start rising again, and become a plateau.
We don’t know where we are at the moment. But, I wouldn’t be surprised to find there’s going to be some disasters. Not because of the technology, but because of the misuse of technology. People picking the wrong system for them, and not picking it for the right reason.
Not so long ago, I talked to a developer in the games industry – I won’t say which particular company – and they were saying that they really wanted to use Riak on their next project. I said, “Why do you want to use it?” and they replied, “Because I haven’t used it before”. That’s not the right answer. The right answer is, I’ve got a use case for this.
Similarly, with Spine 2, the NHS is going to replace Oracle with Riak, and with the best will in the world, the NHS is notorious for making a mess of projects. If it’s implemented correctly and they’re doing it for the right reason, as opposed to just because they want to get rid of Oracle, there’s no reason why it shouldn’t work. If that goes wrong, then that could be a backlash, and people will equate the technology with the decision to use it – and that’s the wrong conclusion. The technology is fine, as long as it’s used in the right way. And that’s the same with relational databases.
How have you seen Cassandra grow over the past few years?
If you look at this [Cassandra] conference, it’s full, and at the one in San Francisco, I think there were around 1000 developers there, and I think people are just taking it up left, right and centre, and I hope that they’re taking it up for the right reasons. That’s the next problem!
The other worry is that people will be trying to run it on a single server when it’s not designed for that because that’s what they did before, so I think that the steps that Datastax are taking to train developers are very important. To give developers the right mindset, and the way of thinking about their development rather than just trying to hack code together. It was announced this week that there’s this training program that Datastax are putting online, and it’s going to be so accessible. You can just log on and do the training and hopefully though the community, certainly on iOS and Twitter and what have you, they’re gonna get the support they need to build the right applications the right way.
When I started teaching, the only interfaces were the Thrift interfaces and we used Hector for teaching purposes. And the problem was, students took one look at Hector and just screamed because it just looked so different to anything that they’d done before, and it required a lot more in-depth knowledge of Java than they’d really come across, and some concepts that hadn’t dealt with before. They were trying to learn Java and they suddenly had to use this huge, deep end Java stuff that they hadn’t really come across.
I think the introduction of CQL that we’ve brought in has kind of brought it round. So students sort of feel at home because it feels like they are writing SQL, they can feel like they are using the JDBC driver driver which they are used to using, and it allows them to get their projects written quicker and not get stuck in a load of Java syntax.
They still have a learning curve but I think that’s only going to increase the uptake of Cassandra. It’s really low at the barrier for developers.
What do you think the future holds for NoSQL?
Well if people do it right and it doesn’t go horribly wrong, as I said, it give developers the opportunity to handle much bigger data sets than they previously could do. I think people are going to start discovering that they’ve got large data sets in places that they don’t know – for instance, I believe that the UK government has just announced that environmental research bodies are starting to build big data centres now. So we’re talking mapping data, planning applications and what have you, starting to discovers links between where you put buildings and how it affects the environment.
I think there are a lot of areas we have yet to discover, and I think that’s where it’s going to start to get exciting, in those sort of areas that traditionally haven’t been exciting in this field – they’re going to start finding a lot of interesting answers once they start finding the tools to be able to mine that data. Bottom line, I think there’s a lot of fields out there that have never heard of big data or NoSQL, and they’re going to start using them. Technologies like Hadoop on top of databases are going to lead us to discover unbelievable things in the future, and to bring it back to Java, there’s going to be a lot of people in Java jobs running Hadoop!
How far back do you and Java go?
Andy: I first started using Java around 1997/1998, and very quickly I wrote one of the very first Java books called ‘Java in Easy Steps’, and that would have been in about 1998, so that’s how far that goes back. I teach Java on the server side, you know JSP and servlets and all that sort of thing, and recently, with a colleague of mine, I’ve got into business intelligence MSC, and that brought us round to the big data quite quickly.
People criticise Java for being a very verbose language in comparison to these modern languages like Ruby, and I have to rant against PHP for Java at the start of every term. With Java, it’s verbose, it’s precise, and it’s like that for a reason. It allows you to write code that you can verify to be true, and because you’ve defined all your variables, you’re not going to get a sudden typo.
Do you think it will decline in the future, as many are predicting?
I think Java’s got a huge future. Oracle actually seem to have done quite a reasonable job in my opinion of picking it up when they took it over from Sun. They’re now starting to get a proper roadmap.
I run Cassandra on Raspberry Pi, and you know, previously people were using OpenJDK and if you actually look at the stats running Cassandra, OpenJDK is really slow compared to it. Now Oracle have the native hardware it runs a lot faster. I think on the server side, for things like Cassandra, other big applications, Tomcat etcetera, it’s proving that the original version of Java as a network application is starting to come true.
I don’t think it’s quite as integrated to the network as a programming language like Erlang, but it’s still fairly there. But if I understand it rightly then the original use of Java was going to be set-top boxes for televisions, and again, if I understand things correctly, most Blu-ray disk players have Java built in.
So it’s basically fulfilled its purpose?
Absolutely! It started on way, came around again. Regardless of what people say, [Java] is reliable. Obviously there’s been scares about Java in the browser being insecure, but that’s because you’re marrying two technologies together in some ways – the browser technology with Java running in it.