Apache Spark 2.2 is here and it comes bearing gifts. This release removes the experimental tag from Structured Streaming and focuses on usability, stability, and polish. Read on to find out what’s new in the third release on the 2.x line.
Big Data is changing. Buzzwords such as Hadoop, Storm, Pig and Hive are not the darlings of the industry anymore —they are being replaced by a powerful duo: Fast Data and SMACK. Such a fast change in such a (relatively) young ecosystem begs the following question: What is wrong with the current approach? What is the difference between Fast and Big Data? And what is SMACK?
Machine learning may sound futuristic, but it’s not. Speech recognition systems such as Cortana or Search in e-commerce systems have already showed us the benefits and challenges that go hand in hand with these systems. In our machine learning series we will introduce you to several tools that make all this possible. Second stop: MLlib, Apache Spark’s scalable machine learning library.
Sparkling Water 2.0’s goal is to bring machine learning into the mainstream; this tool from H2O.ai offers an open-source algorithm development platform which helps companies use machine learning algorithms in their data analysis.
Considering a change in your architecture? If you’re looking at Apache Spark, it might be worth seeing what Alex Zhitnitsky has to say about the top 5 things you should consider before the jump. Software architecture is hard.
Data crunchers can rejoice at the sight of Spark 1.4 – support for R, Python 3 plus a load of clustering and container management improvements all make their way to the top of the highlights reel for this cluster computing framework.
Following a high profile endorsement from Cloudera earlier this month, Hadoop oriented hatchling and MapReduce contender is judged mature enough to take next big steps.