GitHub Experiments provides a sneak peak into software development
GitHub is constantly researching how to make life for developers easier, and we reap the rewards! Introducing Experiments: a research effort from GitHub that offers a sneak peak into exciting projects.
Patience is a virtue, especially when it comes to research. We don’t always see the benefits right away, and sometimes research even gets tossed before it sees the light of day. At GitHub, this is also true. Research on machine learning, design, and infrastructure is constantly happening, but it is usually behind closed doors.
Put on your lab coat
In order to open up the laboratory door a little wider, GitHub has launched Experiments. Experiments gives insight into ongoing research, showcases publicly available demos, and opens up discussion to the community. They continue to pay attention to what the community thinks, as is shown in their recent Project Paper Cuts announcement.
GitHub notes that it can’t share everything it works on with the public, no matter how innovative. However, with the launch of Experiments, the public will continue to get tastes and keep an eye on future innovations. Hopefully, feedback from the public will also be taken into consideration in order to make GitHub even better.
SEE ALSO: A developer’s introduction to GitHub
The first demo is out for your perusal presented by the power of machine learning: Semantic code search. It is the first in what hopes to be many more experiments unveiled by GitHub.
As an added bonus, GitHub has also shared an open source example of code and data for developers to reproduce results with. This tutorial is also used in Kubeflow – the machine learning toolkit for Kubernetes. Check out more details on that here!
Semantic code searching’s intended use case is for targeted searches of code within repos, organizations, or users. This is in contrast to more generalized searches.
Straight from the demo itself:
“Semantic Code Search allows you to find code through meaning instead of keyword matching. That means the best search results don’t necessarily contain the words you searched for.”
This concept is no stranger to developers as they tirelessly search the web, trying out different key words and phrases. Is such a problem possible to fix with machine learning, and if so, how close is GitHub to getting it right?
A few search suggestions are given in the demo for you to test out in order to get a feel for what the finished product will be like.
The GitHub engineering blog goes deep into mechanics behind the project and how deep learning contributes.
Ironing out the wrinkles
Hold the champagne glasses for now. So far, there are a few limitations to the semantic code search that require tweaking. Currently, the demo has only been trained on a small batch of Python code. So, expect limited search results in regards to both quality and quantity while testing it out.
Eventually, the project aims to eventually extend towards other languages – programming as well as spoken.
Some readers may be currently wondering, “Why work on such a complex AI project instead of smaller fixes with their search functionality?”
Well, a user on HackerNews posed a similar opinion and received an answer from the head of platforming engineering.
…GitHub search can’t even search for a literal string, let alone a regex. It can’t search a subdirectory. Ranking is indistinguishable from random. It’s been this way for years. How about building an actual, usable, basic code search and then getting all fancy with your machine learning?…
To this, the head of platforming engineering Sam Lambert replied: “All I can say is that we know this. We know it should be better. There’s definitely more to come.”
If this experiment becomes a mainstream stay in GitHub, improvements will certainly be made. For now, we look forward to seeing what other experiments are in store!