API performance stress test

Taking the Varnish API Engine 2.0 for a test drive

Per Buer

What performance impact does the Varnish API have when used with fast web servers? CTO and founder Per Buer explains the execution and outcome of the Varnish performance test.

Having worked on improving website performance for a decade, a customer with a large API deployment told me about severe performance issues when trying to scale their APIs with their existing management tool. That doesn’t come as a surprise as most of the tools used today were designed 10–15 years ago. The tool this customer was using had a peak performance of 200 API calls per second, per server. This is no longer enough for the needs of high volume mobile, IoT and web applications.

Care should be taken when reading analyst reports on the subject. When we started to look into API management I was surprised that most industry reports ignore performance completely. Measuring the performance of various API management tools is a complex task: most of them are not plug-and-play solutions and don’t support the same configurations or semantics. So comparing one to another isn’t trivial.

As the industry analysts shy away from what perhaps is the most important property of any internet-connected piece of software it is up to the users to evaluate whether or not the performance of the product is up to scratch. If you leave out all the API management “bells and whistles”, you could run a stress test that only focuses on performance. So in the course of developing Varnish API Engine 2.0 we set up such a test to at least measure our own performance and make it more transparent.


The scope of the test is to establish the performance impact of introducing the Varnish API Engine in front of very fast web servers in various scenarios using synthetic load. The synthetic load is selected to match each scenario, triggering typical API gateway features. The tests are deliberately done without caching and focus on the performance of the authentication, authorization and throttling features of the Varnish API Engine.

The Varnish API Engine environment in this test replicates our basic subscription consisting of a three­-node cluster and management nodes.


An initial reference test is performed without the Varnish API Engine. Then a set of tests is performed with Varnish API Engine. The results from these tests are compared to the reference test.


  • Boom is used to generate load. It is written in Go, and its feature set is similar to Apache Bench. Two patches have been added to enable connections to be reused and to support custom host header (
  • Dummy API is used as the web server. This is a simple and easy­-to­-install web server written in Go using net/http. It should be noted that the Dummy API is unrealistically fast. Any real world API will most likely be much slower. In addition to being run within a highly performant runtime the Dummy API has no external data sources, such as databases or other network based APIs behind it.
  • Varnish API Engine 2.0 with Varnish Cache Plus 4 is used as the API gateway.

Reference test

As a reference, we measure the performance of the web servers directly as seen from the consumer instances. The test is done by sending GET requests to a specific resource on the web server, to which the web server responds with the http status code 200 and a small response body.

Test scenarios

The reference test is compared to the following scenarios with the Varnish API Engine:

  • Test scenario 1: Authentication and authorization. The requests are accepted with http status code 200 sent from the backend. This should uncover the impact of proxying the API through the API Engine.
  • Test scenario 2: Authentication, authorization and one rate limit. The requests are accepted with http status code 200 sent from the backend. Here we expect a bit of performance degradation as the cluster wide rate limiting will add synchronous reading and writing to memcached and a couple of milliseconds is enough to see a bit of performance degradation.
  • Test scenario 3: Authentication, authorization and two rate limits. The requests are accepted with http status code 200 sent from the backend.
  • Test scenario 4: Authentication, authorization, two rate limits and HTTP method filter. The requests are accepted with http status code 200 sent from the backend.
  • Test scenario 5: Authentication, authorization and one rate limit which allows only one request. The rest of the requests are throttled, meaning rejected with http status code 429 sent from the Varnish API Engine.
  • Test scenario 6: Authentication where the consumer tries to authenticate using an unknown api key. The requests are rejected with http status code 401 sent from the Varnish API Engine.


Boom, which is used by the three consumer instances to generate load, is executed with the following arguments in all tests:

Figure 1: Arguments

Figure 1: Arguments

The argument -H sets the custom host header, -n sets the amount of requests, and -c sets the concurrency.

Boom is executed simultaneously on each of the three consumers. The consumers generate load towards each of the three Varnish API Engine instances (test scenarios) or the three web servers (reference test).

Figure 2: Example command from the reference test

Figure 2: Example command from the reference test

Figure 3: Example command from one of the test scenarios

Figure 3: Example command from one of the test scenarios

Test Environment

The environment consists of 12 instances running in Amazon EC2 within the same availability zone (us­-east­-1c) and placement group. The instances are running the CentOS 7 x86_64 EBS HVM AMI on m4.xlarge instances with 4 vCPUs, 16 GiB memory, 8 GB SSD general purpose storage and Network Performance rated as high.

The instances are used as follows:

  • 1 instance running the Varnish API Engine management
  • 1 instance running Varnish Custom Statistics
  • 1 instance running Memcached for accounting
  • 3 instances running Varnish Cache Plus with Varnish API Engine
  • 3 instances running a web server acting as backends
  • 3 instances acting as consumers to generate load

The consumers communicate with the three Varnish instances directly, without using an Elastic Load Balancer (ELB). The consumers put load simultaneously on the the web servers (reference test) and the Varnish instances (test scenarios). Dummy API is running on the web servers instance and binds to host and port 1337.

Installation and configuration

Ansible is used to install and configure the environment. The ansible configuration is available at GitHub.

Results [1]

Figure 5: Requests per second

Figure 5: Requests per second

Figure 6: Average response time

Figure 6: Average response time

The tests give us a baseline capacity of processing around 23K API calls per second on the three­-node API Engine cluster. Mobile clients will drag it down slightly. This performance hit will most likely be mitigated by Varnish API Engine’s ability to cache some of the responses, and reject throttled and unauthorized requests. We believe that 23K per second is achievable in a real­-world scenario and would be enough for those that don’t ignore API call performance but are seeking to boost it.

[1] The results and other aspects in more detail can be downloaded under:
Per Buer
Per Buer is the CTO and founder of Varnish Software, the company behind the open source project Varnish Cache. Buer is a former programmer turned sysadmin; then manager turned entrepreneur. He runs, cross country skis and tries to keep his two boys from tearing down the house.

Inline Feedbacks
View all comments