Countdown to ML in 5, 4, 3, 2, 1

Top 5 open source machine learning projects

Jane Elizabeth
machine learning
© Shutterstock / Vasilyev Alexandr

Looking to improve your ML skills? Why not take a look at some of the most popular open source machine learning projects on GitHub? We’re taking a closer look at the top five projects to the state of open source machine learning.

We’ve been over this a bunch of times, but it’s clear enough to say that machine learning is one of the hottest skills in tech right now. Earlier this year, Stack Overflow published results from a massive developer survey that ML specialists were second only to DevOps specialists in terms of pay.

Machine learning is experiencing something of a boom time, but open source can often be a bit confusing for newcomers. So, today we’re taking a closer look at the top five open source projects on GitHub to see how the field is developing and see where your help could be used. After all, open source succeeds thanks to collaboration between developers and programmers all around the world! And sometimes, that means helping out with the boring tasks like documentation.

Quick caveat: this list is for specific projects, not just collections of libraries or frameworks. So, several ranking results have been excluded on these arbitrary grounds, just because I felt like it.

Let’s get started!

SEE MORE: Top 5 machine learning libraries for Java

1. TensorFlow  – ★ 76.2K

It’s no surprise to find TensorFlow at the top of this list. It’s by far the most popular and celebrated machine learning project on GitHub by a mile.

Originally a part of the Google Brain team in Google’s Machine Intelligence Research organization, TensorFlow is an open source software library for numerical computation using data flow graphs. It comes with an easy-to-use Python interface and no-nonsense interfaces in other languages to build and execute computational graphs.

“When we open-sourced TensorFlow we were hoping to build a machine learning platform for everyone in the world,” said Jeff Dean earlier this year. TensorFlow 1.0 is fast, flexible, and production-ready for a wide range of applications beyond its initial design. It also includes experimental APIs for Java and Go and new Android demos for object detection and localization, and camera-based image stylization.

2. scikit-learn –★ 22.7K

The next on our list is scikit-learn, a Python module for machine learning. scikit boasts a number of simple and efficient tools for data mining and data analysis. The basic motivation behind scikit is For Science! And as such, it’s highly accessible and reusable across various contexts. Plus, it builds off of well-known data science tools like NumPy, SciPy, and matplotlib.

Earlier this year, we talked to Adam Geitgey, the Director of Software Engineering at Groupon, about how developers could enter the field of machine learning.

“Definitely start by learning Python. It’s by far the most popular programming language today for machine learning,” Geitgey said. “For solving most machine learning problems (which don’t require deep learning), the answer is easy. You just need to install a few python libraries: scikit-learn, NumPy and pandas. These tools are free and designed to work well together.”

SEE MORE: Top 5 open-source tools for machine learning

3. PredictionIO –★ 10.6K

PredictionIO is a newcomer to this list, which makes its high ranking even more impressive. Last month, the Apache Software Foundation released PredictionIO to a great deal of fanfare. PredictionIO is built on top of a state-of-the-art open source stack. This machine learning server is designed for developers and data scientists to create predictive engines for any machine learning task.

Developers can create deployable applications “without having to cobble together underlying technologies” with the full-stack and templates available. Built directly on Spark and Hadoop, PredictionIO allows developers to quickly build and deploy an engine as a web service on production with customizable templates. It is written in Scala.

PredictionIO is mean to simplify data infrastructure management. By implementing your own machine learning models, you can seamlessly incorporate them into your engine. It also speeds up machine learning modelling with systematic processes and pre-built evaluation measures.

4. Swift AI – ★5K

While Swift may be experiencing something of a reversal of fortunes, the Swift AI continues to gain kudos on GitHub. Swift AI is a high-performance deep learning library written entirely in Swift, with support for all Apple platforms. Macbook users rejoice!

Admittedly, the repos are a little thin, especially considered to TensorFlow. However, Swift AI does boast an interesting tool for those interested in writing neural networks in Swift. The NeuralNet class contains a fully connected, feed-forward artificial neural network. With support for deep learning, the NeuralNet is designed for flexibility and use in performance-critical applications.

SEE MORE: What makes an open source project succeed?

5. GoLearn – ★4.7K

Rounding out our list is GoLearn, a ‘batteries included’ machine learning library for Go. Still in active development, this project is looking for developers interested in hearing back from users. GoLearn’s model for machine learning problems will be familiar if you’ve used SciPy, WEKA or R. Data is represented as a flat table, analogous to a spreadsheet, and used for training and prediction.

As befitting a relatively new project, the wish list is longer than the actual current tools. So, if you’re looking for a project to really make a difference in, GoLearn might be the one for you.


Whether you’re looking to join a well-known project or work on a newcomer, there’s an open source machine learning project on GitHub for you. It’s more than a boost to your resume, but a good deed for the whole community. So, head on over to GitHub today!

Jane Elizabeth
Jane Elizabeth is an assistant editor for

Inline Feedbacks
View all comments