Machine learning on an open source stack

Apache Software Foundation is bringing open source ML to the masses with PredictionIO

Jane Elizabeth
© Shutterstock / Room27

The Apache Software Foundation is opening up the field of machine learning with its new open source project, PredictionIO. But how are they making it easier for newcomers to learn this devilishly complicated bit of coding? The clever use of templates, of course.

The Apache Software Foundation has announced a brand-new machine learning project, PredictionIO. Built on top of a state-of-the-art open source stack, this machine learning server is designed for developers and data scientists to create predictive engines for any machine learning task.

PredictionIO is designed to democratize machine learning. How?  By providing a full stack for developers, they can create deployable applications “without having to cobble together underlying technologies”. Making it easier to use should widen the appeal and keep the machine learning bottleneck from getting any worse.

Apache PredictionIO

What can you do with Apache PredictionIO? Lots. It allows developers to quickly build and deploy an engine as a web service on production with customizable templates. PredictionIO responds to dynamic queries in real-time once it is deployed as a web service. It also evaluates and tunes multiple engine variants systematically.

It’s built directly on top of Spark and Hadoop, and serves Spark-powered predictions from data. It can also unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics. Apache PredicitionIO can be installed as a full machine learning stack, bundled with Apache Spark, MLlib, HBase, Spray and Elasticsearch.

SEE MORE: New Apache Spark library aims to make deep learning approachable

PredictionIO is mean to simplify data infrastructure management. By implementing your own machine learning models, you can seamlessly incorporate them into your engine. It also speeds up machine learning modelling with systematic processes and pre-built evaluation measures.

How does it work?

PredictionIO is made up of three parts: the PredictionIO platform, the event server, and the template gallery.

  • The platform is the open source machine learning stack for building, evaluating, and deploying engines with machine learning algorithms.
  • The event server is the open source machine learning analytics layer for unifying events from multiple platforms. Commonly, it should continuously collect data from your application in real time or in batch. A PredictionIO engine then builds predictive models using the data with at least one algorithm.
  • The template gallery is the place for you to down load different engine templates for different types of machine learning applications.

The template system is probably the most innovative part about PredictionIO. The templates do a lot of the hard work, making it easier for newcomers just testing the waters of machine learning.

Also, to be perfectly honest, the PredictionIO webpage has a lot of documentation notes to help as well. It’s a lot more newbie-friendly than most, which is in line with their desire to bring machine learning to the masses.

SEE MORE: Top 5 open-source tools for machine learning

Where can I get it?

If you’re interested in trying it out for yourself, PredictionIO is available for download here as well as on GitHub.

Jane Elizabeth
Jane Elizabeth is an assistant editor for

comments powered by Disqus