ML trends in Stack Overflow Developer Survey 2018
Every year, Stack Overflow surveys the state of the developer community. What trends, tools, and technologies did they find? Julia Silge, a data scientist at Stack Overflow, dives deep into the data to show the most loved technologies of 2018.
As a data scientist at Stack Overflow, I use machine learning in my day-to-day work to make our community the best place possible for developers to learn, share, and grow their careers. It has been amazing to see the increasing interest and investment of the software industry as a whole in machine learning over the past few years. Skilled data scientists can use analysis and predictive modeling to help decision makers understand where they are and where they can go. In March, Stack Overflow released its 2018 Developer Survey results, the eighth year we have surveyed the developer community; this year we had over 100,000 qualified respondents and it was clear that machine learning in software development is an important trend that’s here to stay. But what are the key tools and technologies to watch?
Our survey was about 30 minutes long and covered a diverse range of topics, from demographics to job priorities, but a large section focused on technology choices. We asked respondents what technologies they have done extensive development work in over the past year, and which they want to work with over the next year. We can understand how popular a technology is with this kind of question, but by combining the questions, we can understand how loved or dreaded a technology is, in the sense of what proportion of developers that currently work with a technology do or do not want to continue to do so.
The most loved technology this year among our list of frameworks, libraries, and tools is TensorFlow, a machine learning library released as open source by Google in 2015. TensorFlow won out this year over beloved, popular web frameworks like React and Node.js, last year’s winners. We didn’t ask about TensorFlow on last year’s survey because it had just started to gain wide popularity. TensorFlow’s emergence onto the scene has been so dramatic that it exhibits one of the highest year-over-year growth rates ever in questions asked on Stack Overflow.
TensorFlow is typically used for deep learning (a specific kind of machine learning usually based on neural networks) and its status in our survey is a demonstration of the rise of tools for machine learning. Notice that PyTorch is the third most loved framework; PyTorch is another open source deep learning framework, but one developed and released by researchers from Facebook.
How are technologies related?
As a data scientist at Stack Overflow, I spend a lot of time thinking about how technologies are related to each other, and we can specifically think about that in the context of machine learning technologies on the Developer Survey this year. If we look at all the technologies we asked about on the survey, from languages to databases to IDEs to platforms, which were used most often in the context of machine learning? For example, which technologies are most highly correlated with TensorFlow?
Here we see which technologies are most likely to be used by a developer who also uses TensorFlow, compared to those who do not. The most highly correlated technology is Torch/PyTorch; this is interesting because it is effectively a competing framework. Next comes the popular Jupyter Notebook IDE that is used by many data scientists and then the two big language players when it comes to machine learning, Python and R. Python have a larger user base, but I personally am an R developer. Most developers interact with TensorFlow via the Python API, but R has excellent support for TensorFlow as well. The other technologies here include other IDEs that focus on data science and/or Python work, like RStudio and PyCharm, and big data technologies such as Apache Spark, Apache Hadoop, and Google BigQuery.
SEE ALSO: A basic introduction to Machine Learning
Fastest growing, most wanted
Python is the programming language most correlated with TensorFlow, and in fact, Python has a solid claim to being the fastest-growing major programming language. This year Python again climbed in the popularity ranks on our survey, passing C# this year much like it surpassed PHP last year. Software developers are aware of this, and this year we found that Python was the most wanted language, meaning that out of developers who are not working with each technology, the highest percentage want to start this coming year.
This plot shows the top 15 languages that were most wanted. What languages were least wanted this year? We find that VBA, Delphi/Object Pascal, Cobol, and Visual Basic 6 are the least attractive today. These languages certainly lack the name recognition of Python, but more substantively, they do not have the large, vibrant communities working on modern problems like machine learning. June 2017 was the first month that Python was the most visited tag on Stack Overflow in high-income countries like the United States and the United Kingdom. We find that the incredible growth of Python is driven largely by data science and machine learning, rather than web development or systems administration.
Machine learning offers organizations the opportunity to use their data to make good decisions; our own data at Stack Overflow demonstrates that our industry is embracing this possibility and that the use of machine learning is on the rise. If you are a developer interested in machine learning, TensorFlow, and deep learning are likely not the best place to start. Instead, focus on gaining statistical competency and putting it into practice in your daily work.