Apache Foundation brings big data processor Storm into the fold
Real time data processing tool joins Hadoop and Zookeeper in the Apache nest.
Apache Foundation has voted to usher real time data processing tool
“Storm” into its incubator
program, with the view of one day integrating it officially
into the Foundation’s shiny open source universe.
Storm is certainly a promising podling for
Apache, with high fault tolerance and scalability. It works by
lining up jobs and exporting them to a cluster of computers, before
rejigging everything back together into usable form.
Although architecturally it’s superficially
similar to Hadoop, Storm is aims to be the go-to tool for real time
processing, whereas the former excels in
batch processing. Indeed, many companies use them as complementary
systems – something that was cited in the rationale for Storm to
join the Apache community.
Bolstering Storm’s case for integration, issues
with Hadoop’s lack of real time functionality have been cited as a
hole in the data processing ecosystem, and Storm is important for
plugging this gap with software which, according to Nathan
of the Storm GitHub repository, “exposes a
set of primitives for doing real time computation. Like how
MapReduce greatly eases the writing of parallel batch processing,
Storm’s primitives greatly ease the writing of parallel real time
Big guns currently
utilising Storm include
creators Twitter, which employs it to keep tabs
on user clicks for every URL and domain, and Groupon, where, among
other things, it’s used to build real-time data integration
systems. Yahoo also employs Storm to supplement Hadoop in batch
processing. With around 50 high profile companies dependent on the
program, there is little risk of it becoming an orphan
Given the impressive list of Storm adopters,
there also shouldn’t be any issues for Apache in fostering
community involvement – something it makes a point of doing for all
incubators. An open, diverse meritocratic
community around a project is an essential factor taken into
consideration when considering whether podlings are viable to
‘graduate’. And once Storm is ensconced within the mighty vulture’s
community-driven nest, interest from those who want more than
Hadoop can offer will doubtless continue to grow.