One To Watch: Continuuity, big data for developers
We speak to the ex-Yahoo! and Facebook employees aiming to bring the power of Hadoop to the masses.
Hadoop’s evolution from a humble framework to (in the
words of creator Doug Cutting) “the kernel of the mainstream
data processing system” continues apace. The latest development is
the emergence of so-called ‘Big Data-as-a-Service’ companies
promising to take the hassle out of Hadoop, such as plucky startup
Having been founded around a year ago, the company launched at Hadoop Strata in October on a wave of slightly apprehensive media hype. Speaking to CEO Todd Papaioannou and CTO Jonathan Gray, however, it’s clear that Continuuity is more than just hot air.
For a start, Papaioannou and Gray are already influential pioneers in the data space. Papaioannou was previously VP and chief cloud architect at Yahoo, before leaving to work at Battery Ventures for three months (“It was fun to pretend to be a VC,” he says of the stint) and Gray worked with HBase at Facebook – a company which, it’s fair to say, has fairly big data problems.
However, they both left these comfortable jobs to found a new company aiming at a gap in the burgeoning ‘big data’ market. “Really, the first part of this year was heads-down, build an MVP and get some customers on it,” says Papaioannou. The next step was raising $10m from the likes of Andreessen Horowitz and Papaioannou’s ex-employers Battery Ventures to help expand the business (and perhaps obtain legitimacy in the eyes of the TechCrunch crowd).
The ambitious business plan is to take advantage of the existing Hadoop ecosystem to produce a stable platform for application development, which they call AppFabric. “There’s a ton of infrastructure software out there to do HBase, MapReduce, Yarn, Hives,” says Papaioannou. “All of that stuff is low-level, and we think of it as infrastructure, as a kind of data kernel.”
With AppFabric, this complex infrastructure is abstracted away. “The regular Java developer who is writing J2EE for the past decade doesn’t want to have to deal with kernel-level API, so they don’t have to worry about, ‘how do I store data?’ and worry about method calls that are like ‘bytes byte’ and things like that,” he asserts. “They want higher-level APIs.”
Papaioannou rejects the idea that understanding the underlying infrastructure is important, drawing parallels with higher-level programming languages. “If you write computer programs, do you really understand the message bus and the CPU architecture that’s hidden inside your motherboard? Or do you really think about objects and how they interrelate and use abstracts to abstract away some of the structure?”
Similarly, he adds, cloud providers like Heroku and AWS abstract away server infrastructure. “When you go to Amazon you don’t know how many disks you have, or what the networking topology is in the SNAP,” he says.
A screenshot from the Continuuity dashboard.
These custom APIs might worry those allergic to vendor lock-in,
but Papaioannou assures that the company is “very sensitive” to the
issue, and will be releasing the API source code as well as some of
the technology powering Continuuity. In the near future,
Papaioannou says, the company will be releasing Weave, a management
layer on top of Yarn, and a runtime container called BigFlow which
is “similar to Storm or S4”.
The company have already released a beta SDK with an accompanying Eclipse plugin. Applications can be tested on local single-node instance of AppFabric before being deployed to Continuuity-hosted or on-premise private clouds. This will later be followed by a self-service public cloud, though pricing details are still thin on the ground (“we know what we’re going to charge, we’re just not telling anybody!” laughs Papaioannou).
With Hadoop itself being written in Java, it’s currently the first language AppFabric supports, but more languages are likely to follow. “We think the first wave of developers are really the existing Java developers,” says Gray, but “the first very natural step for us is supporting additional JVM-based languages”.
“All of the APIs to the platform, whether that’s data ingestion, whether that’s querying, are all exposed via REST,” Papaioannou adds. “So you can actually integrate your application with pretty much anything that talks HTTP. So right now the deployable code is Java, just like it was in WebLogic and WebSphere, but you can integrate with any language out there that can make a REST call.”
The pair admit that while ‘big data’ is a useful marketing term, it’s thrown about with abandon. “The dirty secret of big data is,” says Gray, “99% of the use cases out there are really medium data. They’re not terabytes – people have gigabytes.”
“I’ve been doing this for over a decade now,” adds Papaioannou, “and actually I think the fundamental change is we’re moving from a data industry that requires schema at write time to a data industry that requires schema at read time.
“And you can abstract away, take away all of the marketing hyperbole, BS and all the rest of it – that’s really the fundamental shift that’s going on here.”