Storm gathered

Apache Foundation brings big data processor Storm into the fold

Lucy Carey

Real time data processing tool joins Hadoop and Zookeeper in the Apache nest.

The Apache Foundation has voted to usher real time data processing tool “Storm” into its incubator program, with the view of one day integrating it officially into the Foundation’s shiny open source universe.

Storm is certainly a promising podling for Apache, with high fault tolerance and scalability. It works by lining up jobs and exporting them to a cluster of computers, before rejigging everything back together into usable form.

Although architecturally it’s superficially similar to Hadoop, Storm is aims to be the go-to tool for real time processing, whereas the former excels in batch processing. Indeed, many companies use them as complementary systems – something that was cited in the rationale for Storm to join the Apache community.

Bolstering Storm’s case for integration, issues with Hadoop’s lack of real time functionality have been cited as a hole in the data processing ecosystem, and Storm is important for plugging this gap with software which, according to Nathan Marz, host of the Storm GitHub repository, “exposes a set of primitives for doing real time computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm’s primitives greatly ease the writing of parallel real time computation.”

Big guns currently utilising Storm include creators Twitter, which employs it to keep tabs on user clicks for every URL and domain, and Groupon, where, among other things, it’s used to build real-time data integration systems. Yahoo also employs Storm to supplement Hadoop in batch processing. With around 50 high profile companies dependent on the program, there is little risk of it becoming an orphan project.

Given the impressive list of Storm adopters, there also shouldn’t be any issues for Apache in fostering community involvement – something it makes a point of doing for all incubators. An open, diverse meritocratic community around a project is an essential factor taken into consideration when considering whether podlings are viable to ‘graduate’. And once Storm is ensconced within the mighty vulture’s community-driven nest, interest from those who want more than Hadoop can offer will doubtless continue to grow.

Inline Feedbacks
View all comments