TensorFlow, Python, and Julia helped make 2018 the year of machine learning on GitHub
2018 was a banner year for machine learning on GitHub. Projects like TensorFlow and PyTorch ranked among some of the most popular on the site, while Python carried on its dominance as a top programming language. It looks like the Octoverse is all about ML and we are 100% here for it.
GitHubbers couldn’t get enough of machine learning in 2018. The data doesn’t lie: the 2018 State of the Octoverse report showed that ML and data science projects ranked highly on all kinds of categories. From most popular to most contributed, fastest growing and more, machine learning was on your mind and on your forks last year.
That said, machine learning and data science are kind of broad topics. What did developers really care about last year? Based on contributions for the 2018 calendar year, GitHub has crunched the numbers for programming language, packages, and projects.
Most popular languages
This list was calculated by going to repos tagged with “machine-learning” and then ranking the most common primary languages. Python is the clear winner and champion here. However, that doesn’t mean developers only work in Python; other languages have serious contenders.
Programming languages like Java, JavaScrip, C++, C#,Shell, and TypeScrip all rank highly both as popular machine learning languages as well as for general programming purposes. However, languages like Julia, R, and Scala are more niche in their appeal. R is an academic and data science powerhouse, while Julia is gaining fans fast for its machine learning appeal. Scala is something of the odd language out, although it is commonly used for big data systems like Apache Spark.
Most popular ML and data science packages
Looking at dependencies, here are the top ten Python packages imported by popular ML projects.
Some surprises here: we’ve never covered NumPy in depth on JAXenter, but apparently it is crucial for scientific computing with Python and it was the most imported package. Nearly 75% of ML and data science projects rely on NumPy to support mathematical operations on multi-dimensional data.
Scientific computation was a big winner in this category, with packages like SciPy, pandas, and Matplotlib all showing up in over 40% of ML and data science projects. Scikit-learn also showed up in nearly 40% of projects.
Neural nets proved less popular as a package, as TensorFlow was only used in a quarter of projects. The rest of the top 10 were basic utility packages for Python.
Most popular ML project
Who had the most contributors in 2018?
TensorFlow, obviously. It was the most popular project with over five times the number of contributors to scikit-learn. Plus, Julia’s source code gained an astonishing number of contributions in 2018.
Other projects focused on natural language processing problems like explosion/spaCy and RasaHQ/rasa_nlu. Image processing also had a generous showing, with CMU-Perceptual-Computing-Lab/openpose, thtrieu/darkflow, ageitgey/face_recognition, and tesseract-ocr/tesseract all making the top 10.
Where will machine learning go in 2019?
Only time will tell. We’re excited to see how things turn out in the coming year!