Introducing software fuzzing – part of AI and ML in DevOps
Justin Reock, Chief Architect for OpenLogic at Perforce Software, describes what is software fuzzing and why it is needed. Justin is the author on a chapter about fuzzing in a new book from Perforce Software: “Accelerating Software Quality: Machine Learning & Artificial Intelligence in the Age of DevOps”.
The lines between the real world and the digital world have been consistently blurring for years, and with that, software has bloomed. Physicists are hypothesizing that information can be considered a form of matter, the fifth form of matter in fact.
More and more, software is linked to the quality of our lives. That means the quality of our software will fundamentally direct the quality of our experience, so there’s never been a more important time to seek out ways to improve our DevOps. One of the tools that helps us explore that is ML.
This isn’t just about having to reboot your phone when an app freezes. While that’s certainly a software quality issue that causes you a minor annoyance, it doesn’t impact your safety (unless you happen to be driving and if so – shame on you!) or access to services. As our technology inevitably fades into the background, the software reacting to us will be our literal backdrop, the infrastructure that moves us around, helps us communicate, and lets us work and collaborate.
If we are living in a software world, and we want to live in a high-quality world, then we need high-quality software testing. We need that testing to stand up to the future, and that means greatly increasing the velocity of our testing frameworks. Although we can get far with human-driven testing, and augment that with things like static code analysis, at scale it becomes more and more difficult to eliminate tester bias from the pool of test cases.
Thinking about the true purpose of software testing, what are we trying to achieve? At the most granular level, we are trying to take the software down as many code execution paths as possible, and we are monitoring the behavior of the application to see how it behaves along those paths. Do we get the output we are expecting? Does the application crash? Can we manipulate the application to show us data that we shouldn’t be able to see?
Understanding that, the problem of tester bias becomes very clear. Human cognition will simply run into limits as we try to analyze source code to think about different ways to break the software. And even if we could think of all the different paths the logic can take, how much time would we need to invest doing that? At the rate at which the software that builds our world is growing, how can we expect to keep pace with this kind of analysis?
This is a well-known problem, and many approaches have formed out of a will to eliminate as much as possible the bias of the human tester. One of those approaches is the notion of software fuzzing. With fuzzing, the goal is to take the software down unexpected paths by hammering it with random, unexpected input. The state of the program is captured and analyzed, and if the software reacts in a way that wasn’t intended by the developer, the input is said to have triggered an “interesting state” (Section 2).
If this interesting state causes the program to behave in undesirable ways, such as in buggy ways, then the code execution path can be further analyzed, and we can determine if we have a bug or vulnerability.
As powerful as this tactic can be, it is also an overwhelming process. If you think about the sheer, limitless amount of random data that we can generate as a test corpus, or set of test data, it becomes evident just how many cycles we might waste testing input that doesn’t do anything new or unexpected.
It is in this area where deep learning is finding yet another useful application. There are several specific kinds of software fuzzers which lend themselves to creating feedback loops. For instance, if we find a piece of input that generates an interesting state, we can look at characteristics of that input to try and find other, potentially similar kinds of input that could generate even more interesting states.
By rewarding a network for creating input that generates an interesting state, we can start training models that are good at generating even more corpus data that can generate even more interesting states.
That’s the idea, anyway, but this is an area of the industry that is truly nascent. Some of the most compelling work has been done inside the AI labs at Microsoft. Patrice Godefroid, a researcher at Microsoft best known for his work on the SAGE fuzzer, has seen some initial promising results with his Learn & Fuzz project, which is being used to find vulnerabilities in the PDF parser component of the Microsoft Edge browser.
Seeing that there is progress being made, and lots of new territory to explore, is very good news and gives us an exciting new avenue to explore in the world of automated software testing. Deep learning combined with powerful fuzzing techniques can help focus our test corpus and ultimately find more bugs. In the coming world where code quality drives quality of life, fully automatic, bulletproof, and unbiased testing methodologies will be essential to continually improving the world around us.