Machine learning at its finest

Facebook’s Getafix is a clever tool that learns how to fix bugs automatically

Eirini-Eleni Papadopoulou
machine learning
© Shutterstock / a-image  

Facebook is on a roll! The company recently announced that they would soon release some internal tools and they did not disappoint. The latest tool to be open sourced is Getafix, which learns from engineers’ past code fixes to recommend bug fixes. Getafix aims to let computers take care of the routine work under the watchful eye of a human. Let’s take a closer look.

Recently open sourced by Facebook, Getafix is a tool that aims to let computers take care of the routine work under the watchful eye of a human.

According to the official blog post, Getafix automatically finds fixes for bugs and offers them to engineers for approval. This allows engineers to work more effectively, and it promotes better overall code quality, not to mention that since Getafix learns from past code changes, it also produces fixes that are easy for human engineers to understand.

The benefits of using Getafix include:

  • Allows engineers to work more effectively, and it promotes better overall code quality.
  • Because Getafix learns from engineers’ past code fixes, its recommendations are intuitive for engineers to review.
  • Uses a more powerful clustering algorithm and also analyzes the context around the particular lines of problematic code to find more appropriate fixes.
  • Offers significantly more general capability, remediating issues in cases where the fix is context-dependent.

SEE ALSO: The first of its kind: Harness reinforcement learning in production with Facebook’s Horizon

The diagram below shows the three main components of the Getafix toolchain.

Let’s take a closer look at the functionality and challenges in each of the three main components:

Tree differencer identifies changes at a tree level – An abstract-syntax-tree-based differencer is first used to identify concrete edits made between a pair of source files, such as successive revisions of the same file. Whereas a line-based diffing tool would mark either method as fully removed and inserted, the tree differencer detects the move and can hence also detect the insertion within the moved method as a concrete edit. A challenge in the tree differencer is to efficiently and precisely align parts of the “before” and “after” trees, so the right concrete edits and their mappings from before to after trees get discovered.

Mining fix patterns – Getafix performs pattern mining by using a new hierarchical clustering technique, along with anti-unification, an existing method of generalizing among different symbolic expressions. It then creates a collection of possibly related tree differences and uses the fix patterns representing the most common program transformations in that collection. These patterns can be abstract, containing “holes” where program transformations differ.

Creating patches – The final step takes buggy source code and fix patterns from the pattern mining step and produces patched versions of the source code. There are typically many fix patterns to choose from and so a challenge that need to be addressed in this step is selecting the correct pattern to fix a particular bugIf the pattern applies in several locations, Getafix must also select the right match.

For more detailed information on this very clever tool as well as some examples, head over to the official blog post.

Eirini-Eleni Papadopoulou
Eirini-Eleni Papadopoulou was the editor for Coming from an academic background in East Asian Studies, she decided that it was time to go back to her high-school hobby that was computer science and she dived into the development world. Other hobbies include esports and League of Legends, although she never managed to escape elo hell (yet), and she is a guest writer/analyst for competitive LoL at TGH.

Inline Feedbacks
View all comments