MongoDB’s Dwight Merriman: “We think all the NoSQL products are competitors”
We meet the king of the Silicon Alley startup scene to discuss NoSQL, breaking records, and why he believes the success of the humongous database is down to more than just good marketing.
NoSQL database MongoDB had even non-tech journalists biting for a scoop this fall when it emerged that it had received an eye-watering cash injection to the tune of $150 million following its latest round of funding. JAX Magazine caught up with the co-founder of what‘s been dubbed ‘the King of New York startups’, Dwight Merriman, at October’s JAX London 2013 conference to hear firsthand his take on the NoSQL scene. (This article was originally published in JAX Magazine).
JAXenter: Your JAX London talk was very focused on agile development. Why do you feel that NoSQL databases go hand in hand with agile development?
Well…I think they should go hand in hand. In the past they haven’t, because the traditional databases we use are relational and the theory was invented 40 years ago - and the product started to emerge 30 years ago. Agile development is much newer. It’s perhaps not fair to relational, but you can design for something that doesn’t exist yet very easily. I think it is challenging to agile development when your backend database for your online system is relational. It just wasn’t designed for that, and it couldn’t have been, as mentioned. So, I think for a lot of reasons today, we’re due for some new technologies on the datastore and database side.
Just the computer architectures we use today, the cloud computing – the software development that we were just talking about, the programming languages we use today – none of these things existed when relational was invented, so it doesn’t automatically fit elegantly with these things we use today for everything else which is newer. It’s a compliment to relational database theory that it’s the longest lived technology on the software side that we really use – which is amazing. But, on the other hand, everything else we use changes more often, and there’s new technologies. We’re not using the programming languages we were using 25 years ago typically. So I think it’s time for some new tools that just fit better with the rest of the stack that we use today. In both the literal stack, and the kind of metastack – methodologies such as iteration and agile development.
With MongoDB reaching record breaking news this year (the company was recently crowned ‘King of New York startups by Bloomberg after a bar setting $1.2 billion valuation) do you feel this will change you as a company? Do you think perceptions of you will change?
I think perceptions are changing…independent of the financing. The amount of enterprise usage of the product is getting to be pretty high – quite high. So that’s great. I think internally, nothing’s changed. I think our goal is consistent…
So the financing won’t affect your strategy?
No, I mean, we’ll use some of the proceeds from the financing to accelerate R&D, invest more in engineering of the core product. You know, databases are, I believe, large in scope projects, and require a lot of work – you know, how long did it take some of the relational products to get to full maturity? To that point where the next release of the product feels kind of like the last release. It took a while – I think it might have taken 15 years. So, there’s a lot of work to do just in that sense, to make the product optimal.
What more do you think you need to do to optimise MongoDB?
There’s so much – so many things. One is just maturing the technology as it stands. And the second is just adding additional capabilities or features. For maturing the technology, it’s things like more granular concurrency, or we want to do a revision of the storage engine to improve things like fragmentation, improve aspects there, changes there to improve performance. So, there’s a lot of core work on the kernel of the product that we want to do.
In addition, there’s features or new capabilities. It can mean little things like these new operators, or it could be big things like, an example from the past was when we added the aggregation framework, or pipeline – aggregation pipeline is, I think, is what we call it now. It provides you a way to declaratively run aggregations, or aggregate querying, or reporting on MongoDB, much like you would do with the GROUP BY operator in SQL. So that was a whole new subsystem, if you will.
I think there’s a whole lot of work that we want to do regarding integration with other products. There’s a Hadoop connector for MongoDB today. I think we will continue to improve that, and do more and more integrations with other products like we’re doing today, with things like WebSphere Informatica.
And then, a very important thing we want to do a lot of work on, and are working on, is operational management of large MongoDB clusters. And there’s a lot of facets to that, like one thing we’ve released recently was the backup service. So it’s MongoDB backed up to the cloud. It’s continuous backups with point and time recovery. It’s a system with a lot of functionality, and we’ll also be offering an on premise version of that too.
When you’re looking at MongoDB, one thing you can’t escape is how hugely successfully it’s been – and a part of that is how astute it’s been at marketing and branding itself. What do you say to people who criticise the company for winning over people with advertising over capabilities?
I think we’re not marketing driven at all. I mean, maybe we do a good job at it, but I think that the organisation is pretty product driven, and driven by needs and requests of users and the community and customer users. That would be the biggest driver of what we work on.
On virtually all the metrics I look at, MongoDB is the most popular of all the NoSQL databases. So it’s not surprising then, that a lot of people come to our booths at big events. A big reason that’s true is that developers like using the product.
One of the goals – when we started the project – we really felt like there was a need on the database. And there were a couple of needs. One ways for scale – the ability to scale horizontally. The other part was, there was just a need for something, we felt, that worked better with the way that we write code today. So to me, it’s not all about scale – we also want to create something that has a certain ‘elegance’ to it (we often use that word internally). So we think a lot about how you interact with the database. It’s not all about, “Can I scale 2000 server clusters?”. Yes, you can, but that is, in our minds necessary but not sufficient. We also want it to be faster and easier to write production applications and systems than it was before – in addition to being able to scale horizontally.
So, because of that, we’ve really focused on that, and I think developers like using the product when writing apps, independent of the scale. We have a lot of users who have very large MongoDB clusters – but we also have developers who have written applications many times using MongoDB where they only need one server, and they’ll never need more than that for that particular application. But one that’s big enough on the scale side. Why did they do that? Well, the reason that they did it is because it was the best and easiest way to write the app. And fastest way to write the app. It wasn’t about scale in that case.
So I think that’s what’s unique about the product. It gives you this mix of two capabilities. One is the scale out property, and the other is making development easy and productive. And not in a prototyping sense, but in a production systems sense.
You recently changed from 10Gen to MongoDB. Do you think that’s changed the overall approach to the MongoDB database by your company? Or do you think it’s served to strengthen the community around it?
I think the change, was in hindsight, a bit of a no-op. It was very important to us that the project has a separate identity to it than the company. We definitely think of them as separate entities.
For a long time, the company name and the product were different for legacy reasons. It was sort of an accident. But we kind of liked that – it was clear, this was the product, and this was the company. But at some point it was starting to become confusing. bAnd the only thing we do is work on MongoDB related products and services. It’s all we do. So we thought we should just change the name and it would be less confusing.
Before MongoDB, you’re most well known work was on DoubleClick. What would you say the big differences are working at each organisation?
I enjoyed working on both a lot – and still work on MongoDB, of course. I was CTO and co-founder of DoubleClick. I was CTO for the first ten years there, designing and working on the ad-serving systems with the team there. And a think a lot of desire to create MongoDB came from those experiences there.
We were working with massive scale, serving 30 billion ads a day – it could never go down. And not really having the tools to deal with that scale, so we ended up writing them ourselves. So after that, there’s a feeling of myself and Elliot – our CTO and co-founder – that it’s time for some new things. And that was a lot of the catalyst for the project. It’s really what I wish I had when building DoubleClick. But I do really enjoy working on MongoDB, basically because I’ve been a developer for my whole adult life. I really like technology – it’s kind of my favourite thing to work on.
Is it true you’re actually involved in coding MongoDB still?
Yes, absolutely. The time I spend has varied over time. As it’s grown, there’s a lot of demand for my time on the business side, so sometimes it gets squeezed to a very small percentage. There’s been times in the past when it was one third or one half of my time – like in the early days, I’d spend half of my time coding on MongoDB, and it’s dropped as we’ve gotten bigger. I’d like to actually get it back up to be a little higher than it is right now, which is some, but small amounts, because I do enjoy doing that.
You must have so many teams of engineers that can code at light speed for the database – is it more something you do for love these days than expediency?
Well, I like doing it – but hopefully it’s also useful! There’s also things like product definition, on the technical side. What is it, and what should it do, and those kind of things too which are not coding, but somewhat technical.
Going back to Google a little bit – Google arguably pioneered the use of NoSQL databases, but recently, it’s been returning to relational databases. What’s your opinion on this?
I don’t have a good visibility on what Google’s doing, but I think that they use a lot of products on the data layer internally. So they kind of invented MapReduce, and they did BigTable, and they also use relational things, so, I think they still use a lot of non-relational stuff internally, and they always had some relational too.
I’m not sure that they’re returning to it – I think that they’ve always had it in the mix – but you bring up a good [point], which is, Google, and a lot of these internet companies – Amazon, LinkedIn, Yahoo!, DoubleClick – had all written, basically internally, a NoSQL database of some sort, to deal with the scale that they needed to deal with. But these were not, at the time we started MongoDB, publicly available open source projects.
Part of what we wanted to do when we started MongoDB was start something similar to what these folks were using internally, that was generally available to everyone. And then at the same time, we had some ideas that were new on how to make it developer friendly.
You talked a little bit about how NoSQL has evolved – what do you think the future has in store for it? Will we see a reduction in diversity?
I think ‘one size fits all’ is over. It’s not going to consolidate down to one thing – but there will be some consolidation, or reduction there. We already have – even before NoSQL – we were already using more than one tool. In an enterprise, for example, you would have your relational database management system for both TP and online things, and you would have data warehouse technology for both sys-intelligence and reporting and analytics. So you already had two tools.
They were both relational, and they were both different code bases typically. I think what people are doing now is that they are adding into the mix an addition – a NoSQL database. A given company would probably evaluate several and then pick one that they want to standardise on internally, and they would add that to their tool box in addition to those other tools.
What we’re seeing though is that, for writing applications, they are often for new projects, using the NoSQL tool more than any of the others. What we’re seeing, and what our goal is, is that MongoDB is, whether it’s a startup or a large company. their default for building new applications. It won’t for every application be the optimal tool for that use case, but we think it could be, for the majority of these cases, the right tool.
You’re going to default to something, right? We’ve seen organisations doing this. So for example, the Guardian [UK newspaper], when they write applications, they have a ‘Mongo first’ policy, which means that, by default, when they are writing a new app, the backing data store is MongoDB.
You can use something else if you have a good reason, but we’re default. It’s not going to be optimal for 100% of use cases, but, if we put 100 post-its on the wall, and we write a use case on each one, the best choice for more of those post-its than any other will be MongoDB.
Cassandra’s creators have said that, in the future, they’re going to be encroaching more into your territory. Do you perceive any of the other databases out there as a threat to MongoDB?
We think all the NoSQL products are competitors. So, I think there’s always been competition there, and that’s always been the case – it’s not new. I think that most of the products are doing very well, so the whole space is growing.
One good thing is that you’ve got a lot of products, a lot of vendors, and they’re all growing. So from a bedding on technologies point of view, that’s a really good thing. It means there’s a real space receptor there. When you have a technology and there’s only one vendor in the whole space, you have to ask yourself well, why is that? Why is there no diversity there at all?
You’ve worked your magic with DoubleClick, and now MongoDB. Could you ever imagine moving into yet another field?
I think – I hope – to keep working on MongoDB for the next ten years and beyond. That’s my plan for the moment. I wouldn’t be surprised if I invest in some start ups and things like that, in addition, in parallel. But that’s really what I want to focus on. It takes a long time for software technologies to reach maturity.
Because of legacy technologies, it can take a decade or two for something to have replaced it completely. Look at Cobalt, how long did it take for that go away? A long time right! And in addition, how long does it take to make these products completely mature? It doesn’t mean that they’re not useful today though. We want to keep working on it, and just stay focused – and it will take time.
Image by Eric Auchard