Apache Spark 1.4 brings support for R and Python 3
Data crunchers can rejoice at the sight of Spark 1.4 – support for R, Python 3 plus a load of clustering and container management improvements all make their way to the top of the highlights reel for this cluster computing framework.
The latest shipment of Apache Spark 1.4 comes with a new R language API as well as Python 3 support. The announcement showcases the fifth instalment of the 1.x series with usability improvements to Spark’s core engine and help from over 210 contributors.
Improvements and features
SparkR is the package that adds R support in 1.4 and is based on Spark’s new DataFrame API. Developers using R will be able to to write code that scales across numerous Spark nodes, with all input and output formats additionally available. As of Spark 1.4, users programming in R can also call directly into Spark SQL and have been urged to check out the programming guide.
Further extensions of the DataFrame API were made too, “with a focus on analytic and mathematical functions”. The full list of extensions is available here.
For Python fans, Python 3 is now supported and maintains backwards-compatibility with Python 2.6. While recent data may suggest that 2.6 support in Spark isn’t as useful in the wider ecosystem, contributor Josh Rosen states that their main motivation for 2.6 support was its default setting on “a few” Linux distributions:
So far, I think the overhead of supporting 2.6 has been fairly minimal, mostly involving a handful of small changes such as not treating certain object as context managers (e.g. Zipfile objects).
Container management and clustering features have also been amped up in this release, as Docker and cluster support in Mesos get the green light. Mesos can now be launched via a Docker image and also utilises Mesos’ cluster mode.
Spark Streaming has been improved as it adds visual instrumentation graphs, while debugging information has been seriously enhanced in the UI. Kafka and Kinesis in 1.4 have been given a proper boost as well. Spark Core has also been granted a REST API for application information and performance improvements in the Tungsten project.
The Spark Team have thanked a shedload of organisations for benchmarking and integration testing: Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks. Spark 1.4 is available for download here.