Is Hadoop losing its spark?
A 2015 survey by Gartner Inc. revealed that only 18 percent of respondents expressed their desire to either try out or adopt Hadoop in the next few years. However, this report is not the only one which suggested that Hadoop’s star is fading.
Newer big data frameworks such as Spark have started to gain momentum and, according to the Apache Software foundation, companies are running Spark on clusters of thousands of nodes, which the biggest cluster encompassing nearly 8,000 nodes. Although many people rushed into writing Hadoop’s obituary, market research firm MarketAnalysis.com announced in its June 2015 report that the Hadoop market was projected to grow at an annual rate of 58 percent, surpassing $1 billion by the year 2020.
The discussion about whether Spark is meant to replace or enhance Hadoop is still ongoing, with a third group of professionals claiming that Spark and Hadoop should be used together for boosted analytics and storage capabilities. The reality is that companies can take advantage of Hadoop’s capabilities if they integrate Spark with it; the former enables Spark workloads to be deployed on available resources anywhere in a distributed cluster and eliminates the need to manually track individual tasks.
SEE ALSO: Spark vs Hadoop –Who wins?
Data expert and best-selling author Bernard Marr explained in a Forbes article that many vendors offer both Spark and Hadoop and advise companies on which they will find most suitable. Marr pointed out that even though Spark is developing very quickly, the security and support infrastructure is not as advanced as Hadoop’s since it is still in its infancy.
“Hadoop was madness”
JAX Finance speaker John Davies told JAXenter.com that to him “Hadoop was madness”. “Apache Spark is part of the way back to common sense but much of the big data we have today is because we’re making the data bigger than it needs to be, we’ve been lazy. By making the data smaller, leaner and faster (Fast Data) we can run Spark several orders of magnitude faster than Hadoop with a fraction of the work and complexity to get there.”
One of Hadoop’s problems seem to be its lack of agility when it comes to offering better software. However, according to Gartner’s report, skills gap remains a major adoption inhibitor; nearly 60 percent of respondents admitted that the lack of skills is what drives them away form Hadoop. Gartner estimated that it will take two to three years for the skills challenge to be addressed. “Beyond skills, demonstrating the value of Hadoop is the second-highest challenge.”
Many people believe that Spark and Hadoop are better off together, but they can also be used one without the other. Hadoop does not need Spark because it includes not only a storage component, but also a processing component (MapReduce); conversely, Spark can be used without Hadoop, but it can also be complementary to it. Still, there’s a symbiotic connection between them that (to a certain extent) does not allow either one of them to fall into oblivion.