Why are so many machine learning tools open source?
Open source and machine learning go together like peanut butter and jelly. But why? In this article, Kayla Matthews explores why many of the best machine learning tools are open source.
Machine learning is an extremely promising technology. Interestingly, many of the widely used and introduced tools in the machine learning community are open source — such as Google’s TensorFlow and OpenAI, which is partially funded by Elon Musk.
Because machine learning is such an exciting and potentially lucrative technology, people are often surprised that developers and companies build their offerings as open source, in effect giving them away for free. However, there are several reasons why that decision makes sense for machine learning.
Open source spurs innovation
Open source tools give developers the ability to tinker with them, thereby increasing the chances of rapid improvements or experimentation that could expand the usage or features of tools. Machine learning is a quickly evolving technology, and the more people that are working on tools related to it, the more likely it is that visionary ideas become realities.
With any technology that captures so much interest from the tech sector and the public alike, being able to bring products to the market quickly and know they’ll work as intended is essential. Open source tools allow both those things to happen.
SEE ALSO: A basic introduction to Machine Learning
Faster problem solving
Facebook is an example of a prolific company that makes a habit of releasing open source tools. One of them is Infer, which autonomously scans the code associated with Facebook’s mobile apps to catch bugs before release.
Facebook’s representatives say that when a greater number of people work on projects with the intent to make them better, the issues become resolved faster than they could otherwise. That’s a definite advantage for machine learning tools as well as other types of applications.
Google is another company that embraces open source software for machine learning, particularly with its DeepMind technologies. DeepMind is a company specializing in creating neural networks based on how humans learn.
Last spring, Google announced it was open sourcing Sonnet, a DeepMind project that’s an object-oriented neural network library. Among its reasons for doing so was to encourage ongoing research from developers that could support Google’s internal best practices for research and provide material for future research papers. Additionally, open source permits people to continually give back to Sonnet by perpetually using it for their projects.
The lack of research papers in the machine learning community is a longstanding issue brought up in a research paper published in 2007 arguing how open source projects could reduce the problem. It suggested offering the respective software mentioned in a research paper under an open source license at the same time the academic material gets released.
In that same publication, the scientific team also mentioned how the nature of open source tools allows developers to take their projects with them even if they change employers. That characteristic makes people theoretically more likely to contribute to Sonnet or any other machine-learning platform without worrying that they might lose the tools they’ve been using and improving for months or even years.
Accelerating industry acceptance
Analysts say the Internet of Things (IoT) and its associated connected devices will generate 44 trillion gigabytes of data by 2020. Understandably, people wonder what the overall value of that data will be and how they can make it maximally advantageous. The speed and accuracy that’s possible with machine learning both potentially make it easier to sort through data and find the most relevant material within it.
One of the persistent barriers to adoption associated with machine learning is the lack of all-encompassing experience. Although some engineers may understand particular aspects of it, it’s still very hard to find highly qualified people who can grasp all or most of the main components of the sector.
However, on the other side of things, open source projects feature a relatively small, standardized set of approaches that developers can start working with almost immediately, even if they aren’t familiar with all the aspects of the technology yet. As a result, people in the industry accept those approaches on a widespread basis and see that developers can do things now as a result of open source technologies in a few days when they used to take months to achieve.
People who work with open source machine learning tools also find they have thriving online communities at their disposal that allow them to tap into collective thinking when they run into unexpected difficulties.
Those forums currently have hundreds of answers to common problems, and as machine learning tools become even more popular, the knowledge base will expand.
These are just some of the numerous reasons why many companies and developers recognize the value in open source machine learning offerings. Their decision to provide them could give sustained momentum to the technology at large.
This article is part of The state of Machine Learning JAX Magazine issue:
Machine learning is all the rage these days and if your company is not in the conversation, perhaps you want to hear how trivago, BigML, Studio.ML, Udacity, AWS and Skymind put this technology to good use.
We recently talked about the importance of ethics education among developers so now it’s time to have the “how to develop ML responsibly” talk with you. Last but not least, if you thought machine learning and DevOps don’t mix well, think again.
I wish you a happy reading!