The state of testing today

No Honeycomb Testing and Mocks? You’re Probably Getting App Testing Wrong

Nate Lee
© Shutterstock / iconer

An ideal testing solution should model and then replay all associated inbound and outbound traffic. Essentially, the traffic would be realistically “mocked” without interrupting the developer team’s workflow as the code is created, from the very beginning of and through the entire development process.

Many organizations are learning the hard way: Traditional testing processes for virtual machine-based applications have largely proven to be unviable for Kubernetes environments.

In non-Kubernetes applications, end-to-end user interface (UI) testing is a crucial component of code testing. UI testing involves mimicking the actual UI changes and flows that end-users will experience as they use key features of the app, and ensures that the application provides the desired interaction and functionality to the user. This type of testing can be done both manually and through automated test suites.

However, a UI test does not typically provide a complete picture of how an application or microservice will perform once deployed in a Kubernetes environment. Additionally, UI testing often requires manually clicking through screens and verifying correctness, leading to slowdowns in production-release cadences.

Automated UI test suites exist, but can often be hard to maintain and debug for QA and DevOps departments for microservices. Relegated to an over reliance on canary testing by DevOps teams, productivity is further sapped as the testing and debugging process might only really begin once the application is in production for many organizations.

Fig. 1: Testing apps in a real-world environment with mocks (in blue)

An ideal testing solution should model and then replay all associated inbound and outbound traffic. Essentially, the traffic would be realistically “mocked” without interrupting the developer team’s workflow as the code is created, from the very beginning of and through the entire development process — this process also results in a honeycomb-shaped testing pattern, as we detail below. Testing, in this way, also auto-identifies dependencies in the Kubernetes production environment in which the app will be deployed throughout the testing process. In other words, there is a viable shift left for testing in Kubernetes environments without slowing down development — while also reducing the failure rate once applications are deployed.

Again, these benefits are achieved by testing with sophisticated mocks throughout the entire development and deployment process, which also results in a honeycomb-shaped testing pattern.

The State of Testing Today

Fig. 2: Proper integration testing

The differences between unit, integration and user-interface testing can vary from one organization to another. Unit testing has traditionally been defined as testing the smallest unit of functionality that can be independently tested, consisting of individual methods or API endpoints.

Integration tests should test how individual components — which have already been unit-tested — function together. These tests identify bugs between components, such as issues with the way components talk to each other and pass data back and forth.

UI testing, as described above, is testing from the perspective of the end-user. UI testing identifies the screens and flows that the user will interact with, and ensures they are both correct and provide a good experience. Both unit-testing and UI testing are also still essential to K8s applications, as they ensure the correctness of microservices at the most fundamental level, as well as the best UX for the user.

However, the caveat of the unit-test definition is hidden in the clause “independently tested.” Modern microservices architectures often have high levels of coupling (dependencies) between components. This means components cannot be cleanly and independently tested — their output is too dependent on the output of other subsequent components. The line between what is a unit test and an integration test consequently becomes blurry. Rather than attempting to overcome this flaw, many testing evangelists now propose that engineering departments accept it as a natural evolution of software architecture. They shift the emphasis to better and more comprehensive integration testing or, more likely, testing-in-production.

Ice-Cream Cone Testing: The Reality

Fig. 3: Ice cream cone testing

Many organizations end up doing what’s called “ice cream cone testing” for applications running in Kubernetes environments. They end up mostly doing end-to-end testing of code through a UI. It’s critical to understand that this is a major flaw in testing — a UI test does not provide a complete picture of how an application or microservice will perform once deployed in a Kubernetes environment. It’s like looking at an iceberg only above water. Moreover, provisioning complete end-to-end environments with proper versions of the APIs and accurate test data is also one of the most expensive — slowest — parts of the testing process.

The remedy is to implement true application integration testing that replicates the production environment’s dependencies and potential failures or unusual behaviors among them.

Enter the Honeycomb State of Testing

Fig. 4: Image source. Martin Fowler.

A structure emphasizing integration testing applications for Kubernetes environments can be implemented by applying a honeycomb-shaped structure to the process. A honeycomb structure means that integration testing in mock environments represents the bulk of testing for Kubernetes. Spotify’s honeycomb-shaped testing methodology and testing semantics problems are described in this recent Martin Fowler blog post:

“When I read advocates of honeycomb and similar shapes, I usually hear them criticize the excessive use of mocks and talk about the various problems that leads to. From this I infer that their definition of ‘unit test’ is specifically what I would call a solitary unit test.” — Martin Fowler, “On the Diverse And Fantastical Shapes of Testing.”

Image source. Martin Fowler

According to Fowler, the contrast to a “solitary test” in this context is the “sociable test.” Solitary tests allow for testing without an overreliance on mocks. Sociable tests on the other hand are the most reliable way to validate distributed microservice behavior, and the application code should be tested in a simulated production environment. The application runs during the testing phase as if it were deployed, with all the microservices neighbors connected to mocked pods, containers and APIs — both upstream and downstream.

The Honeycomb and Mock Testing: Fundamental To Engineering

Proper honeycomb testing has a faster mean time to resolution (MTTR) because it more cleanly simulates the upstream and downstream dependencies. Identifying the root cause of test failures in end-to-end testing is like searching for a needle in a haystack. Honeycomb testing, by contrast, gives a much clearer picture of the system under test by focusing explicitly on interaction points between microservices, rather than both connections and implementation details at once. Honeycomb testing can also reveal application deficiencies — such as high latency or API incompatibility — before the canary release.

The canary release process will consequently be smoother, since the probability that the application or update will fail is reduced. In the event of a failure, remediations can start immediately once unexpected behavior is identified based on prior integration-testing baselines, such as total time to serve a request or to load a component on the page.

This approach provides substantial benefits, especially in accelerating development cadence, and can alleviate huge liabilities in mission-critical software. In sectors such as fin- or healthtech, even limited software failures can cause chaos for users and multi-million-dollar losses for companies. It is not viable, for example, for a user to get a 404 message after depositing an electronic check, or a hospital to have no trace of an emergency medical scan transfer that timed out when submitted.

While the honeycomb paradigm was developed long ago, improvements in containerized platforms and service mesh have allowed modern testing suites to be seeded with tools that monitor real-life API traffic and create mocks based on real-world conditions. This is a significant change over the previous generation of simulators, which required API responses to be initiated manually. The manual development of API responses slow down development cadences significantly, and the resulting test suites are incredibly brittle and vulnerable to change. Even a minor update to an API can cause tests to break and the entire development life cycle to grind to a halt while the corresponding mocks are updated by hand.

By contrast, modern tools like Speedscale’s Traffic Replay allow for observability and reproducibility in API responses by using real-world traffic to model how tests and mocks should behave. Speedscale’s mock suite also tokenizes data, such as unique IDs and timestamps, which can be replaced in real time as the configuration of the working production environment changes. This dramatically shortens the time it takes to build out mocks. QA engineers can build more thorough test suites in less time, leading to more accurate testing and faster development timelines.

Ensuring thorough integration testing of Kubernetes-deployed applications and microservices, in this way, can help ensure software services stay healthy and robust.


Nate Lee

Nate Lee is the Founder of Speedscale. Experienced API and Automation expert with 12 years of background in the DevOps Testing space.

Inline Feedbacks
View all comments