Fewer bugs in your code? New study examines programming errors at lowest level
© Shutterstock / Hoppe
Confusing code can (and will!) lead to bugs in code. This study addresses the misunderstandings in source code and tries to untangle the knots in order to prove that “being able to reliably identify and remove code that can cause misunderstandings will also enhance productivity and reduce maintenance costs.”
A new study by researchers from New York University, University of Colorado, Colorado Springs, and Pennsylvania State University, USA is trying to prove that “many code patterns increase misunderstanding at a statistically significant rate versus equivalent code without the pattern” and that removing these code patterns has “a substantial impact on a programmer’s ability to understand larger code.”
The results provide evidence both for and against common coding recommendations and suggest a new method to expand on existing guidelines.
Understanding Misunderstandings in Source Code: Overview
“Source code serves a dual purpose. It communicates program instructions to machines and programmer intent to people.” So far so good. The problem begins when a person comes to a different conclusion about a piece of code’s behavior than a machine does when executing the program.
In other words, a programmer’s interpretation of a piece of code’s behavior differs from that of the machine. While a difference of interpretation can naturally happen in some situations (such as those involving randomness, poorly understood APIs, or undefined behavior), it can also occur in small, self-contained lines of code. These design patterns, which are easy to misinterpret, naturally lead to bugs in code.
These bugs often lead to huge costs and faulty products and there are countless examples of companies that paid the price for some bugs that could have been avoided. The consequences of these bugs can include diminished productivity, faulty products, and higher costs.
However, identifying [and removing] these program elements that cause confusion doesn’t just lead to fewer accidents — according to the study, “the ability to understand pre-existing source code is one of the most important elements of a continuously successful software project.” It will eventually lead to enhanced productivity and reduced maintenance costs.
Confusing code affects comprehension, a concept central to all stages of software development, particularly maintenance and code review.
Researchers seek out and experimentally validate the smallest pieces of code that can routinely cause programmers to misunderstand code. These patterns that lead to misunderstandings are called ‘atoms of confusion’ or ‘atoms’ for short — they can serve as an empirical and quantitative foundation for understanding what makes code confusing.
- They selected programs that are already acknowledged as confusing to humans (winners of the IOCCC – the International Obfuscated C Code Contest) and isolated small patterns of code, often contained within a single line, from the IOCCC programs that were the underlying cause of programmer confusion.
- An empirical human subjects experiment with 73 participants was performed to find which of these code patterns caused a statistically significant amount of confusion (i.e., lead programmers to believe the program containing this pattern behaves differently than the C language specification dictates).
- Next, they measured the impact of removing these atoms of confusion from larger obfuscated programs, also drawn from IOCCC winners. The IOCCC programs were simplified by applying behavior-preserving transformations to remove identified atoms and used these programs as the basis for a second experiment.
- 43 participants who had not taken part in their prior experiment were recruited. Researchers determined, quantitatively, how much programmer error can be reduced by clarifying these atoms.
All the data can be found here.
Atoms of confusion caused considerable confusion among the sampled programmers. However, the study found that “subjects with more experience make fewer errors than subjects with less experience.”
Researchers used two experiments to evaluate small patterns in code that can produce confusion in programmers and “showed experimentally that many code patterns increase misunderstanding at a statistically significant rate versus equivalent code without the pattern.” Furthermore, removing these code patterns had an impact on a programmer’s ability to understand larger code.
Their results offer evidence both for and against common coding recommendations and suggest a new method to expand on existing guidelines.
The materials and data are available at https://atomsofconfusion.com.