GitHub shows the explosive growth of data science and machine learning in 2018

TensorFlow, Python, and Julia helped make 2018 the year of machine learning on GitHub

Jane Elizabeth
machine learning
© Shutterstock / Jirsak

2018 was a banner year for machine learning on GitHub. Projects like TensorFlow and PyTorch ranked among some of the most popular on the site, while Python carried on its dominance as a top programming language. It looks like the Octoverse is all about ML and we are 100% here for it.

GitHubbers couldn’t get enough of machine learning in 2018. The data doesn’t lie: the 2018 State of the Octoverse report showed that ML and data science projects ranked highly on all kinds of categories. From most popular to most contributed, fastest growing and more, machine learning was on your mind and on your forks last year.

That said, machine learning and data science are kind of broad topics. What did developers really care about last year? Based on contributions for the 2018 calendar year, GitHub has crunched the numbers for programming language, packages, and projects.

Most popular languages

What’s the most popular programming language for machine learning? Python, obviously. It’s not much of a surprise that Python comes in first here; we’ve been banging on this drum for ages and ages.

machine learning

Not a lot of surprises here. Source.

This list was calculated by going to repos tagged with “machine-learning” and then ranking the most common primary languages. Python is the clear winner and champion here. However, that doesn’t mean developers only work in Python; other languages have serious contenders.

Programming languages like Java, JavaScrip, C++, C#,Shell, and TypeScrip all rank highly both as popular machine learning languages as well as for general programming purposes. However, languages like Julia, R, and Scala are more niche in their appeal. R is an academic and data science powerhouse, while Julia is gaining fans fast for its machine learning appeal. Scala is something of the odd language out, although it is commonly used for big data systems like Apache Spark.

SEE MORE: Machine learning gone wrong: Why should de-biasing be a priority?

Most popular ML and data science packages

Looking at dependencies, here are the top ten Python packages imported by popular ML projects.

machine learning

What you need to make your ML projects work. Source.

Some surprises here: we’ve never covered NumPy in depth on JAXenter, but apparently it is crucial for scientific computing with Python and it was the most imported package. Nearly 75% of ML and data science projects rely on NumPy to support mathematical operations on multi-dimensional data.

Scientific computation was a big winner in this category, with packages like SciPy, pandas, and Matplotlib all showing up in over 40% of ML and data science projects. Scikit-learn also showed up in nearly 40% of projects.

Neural nets proved less popular as a package, as TensorFlow was only used in a quarter of projects. The rest of the top 10 were basic utility packages for Python.

SEE ALSO: Top 10 Python tools for machine learning and data science

Most popular ML project

Who had the most contributors in 2018?

machine learning

What was everyone working on in 2018? Besides TensorFlow, I mean. Source.

TensorFlow, obviously. It was the most popular project with over five times the number of contributors to scikit-learn. Plus, Julia’s source code gained an astonishing number of contributions in 2018.

Other projects focused on natural language processing problems like explosion/spaCy and RasaHQ/rasa_nlu. Image processing also had a generous showing, with CMU-Perceptual-Computing-Lab/openpose, thtrieu/darkflow, ageitgey/face_recognition, and tesseract-ocr/tesseract all making the top 10.

SEE ALSO: AI and machine learning in software development: Benefits for developers

Where will machine learning go in 2019?

Only time will tell. We’re excited to see how things turn out in the coming year!

Jane Elizabeth
Jane Elizabeth is an assistant editor for

Inline Feedbacks
View all comments