days
0
-35
-3
hours
-1
-4
minutes
-2
-4
seconds
-5
-9
search
Happy testing!

Tutorial – Getting Started with NoSQLBench

Jonathan Shook
NoSQLBench
© Shutterstock / Olga Salt

Released in March 2020, NoSQLBench is the first testing tool that tries to cover all the bases that any serious testing tool should have for distributed systems work. At the same time, it aims to be usable by casual and serious users alike.

What is NoSQLBench?

Developers today want to build applications that can scale. This requires the use of distributed systems that can run across multiple locations, whether these are container images or services that run across public, private or hybrid cloud platforms. However, testing these applications is harder than it should be.

Performance testing tools for systems of scale have been limited. Released in March 2020, NoSQLBench is the first testing tool that tries to cover all the bases that any serious testing tool should have for distributed systems work. At the same time, it aims to be usable by casual and serious users alike.

SEE ALSO: Top tips for transitioning to continuous testing

NoSQLBench was built to solve testing challenges which other tools were simply not designed for. It allows users to model their access patterns in the native query language of a target system. It does not assume that all NoSQL databases are just different versions of the same idea. NoSQLBench doesn’t require you to be a developer to build a meaningful test, and it also doesn’t require you to ship around bulk data in order to have realistic testing data or operations.

Once you configure a workload with NoSQLBench, it’s ready to execute. If you need to change the access patterns or data used in operations, it’s a config change, and you are ready to test again. This is possible, even with datasets of arbitrary size. This gives users direct insight into what can be expected at production time for an equivalent workload.

NoSQLBench has features that you won’t find in other testing tools:

  • recipe-oriented procedural data generation — functions for copy and paste access to enormous virtual datasets
  • deterministic workload behavior — every cycle is specific and repeatable
  • modular protocol support — starting with cql and others
  • configuration language for statements and data — the language of operations and access patterns
  • scripting built-in — enabling automaton-driven analysis for advanced scenarios
  • cycle-specific operations and diagnostics — you can retry any specific cycle to learn more about it
  • a docker-metrics dashboard — support for automatically running and configuring a docker-based grafana stack for rich dashboarding
  • a consistent view of high-fidelity metrics with support for coordinated omission, with multiple output formats and reporting options

Individually, these capabilities are each significant in their own way. When put together, they create a powerful toolkit which makes performance testing much easier for everyone. NoSQLBench allows us to focus on testing requirements without making compromises on testing tools.

It also means we can avoid the costly and complicated problem of building one-off testing harnesses. In practice, these short-lived tools often have critical flaws. After all, building testing tools that work well in all the right ways is non-trivial. We want to break this pattern by offering something that handles all the hard parts without taking control away from the user.

It takes time for a performance testing tool to earn a place in a tester’s toolbox. Thankfully, NoSQLBench has already proven itself as an invaluable tool, both for DataStax and customers alike.

Core Concepts

NoSQLBench workloads are organized in a YAML configuration file. Workloads are based on statements primarily, so the YAML format emphasizes this. You also add bindings to these statements to specify which data is used for operations. For organizing and selecting active statements, you can add tags. Statements can also have statement params which are statement-specific, like whether to mark an operation as idempotent, or whether to use prepared statements, and so on.

Hello World

This tutorial is a high-level introduction to a basic NoSQLBench workflow. You can use this as a basic template for testing at any level. If you need more specific details in your test, add them as you go. Let’s say that you wanted to write 1 billion records to a target system with CQL. For this, you will want to use the cql driver.

Suppose you start with a sketch of your statements like this:

# hello-world-dml.yaml
statements:
 - example: |
    insert into hello.world (cycle,name,sample) 
     values ({cycle},{cyclename},{sample});
   bindings:
    cycle: Identity()
    cyclename: NumberNameToString()
    sample: Normal(100.0D, 10.0D)

Now that we have a starting point, how do we know it will do what we want?

You can preview, cycle-by-cycle what a statement may look like. Here is how the stdout driver would interpret this config to yield some statements:

nb run driver=stdout yaml=hello-world-dml.yaml cycles=5
# output
Logging to logs/scenario_20200403_101645_863.log
insert into hello.world (cycle,name,sample)
 values (0,zero,95.30390911280935);
insert into hello.world (cycle,name,sample)
 values (1,one,104.73915634900615);
insert into hello.world (cycle,name,sample)
 values (2,two,112.3236295086616);
insert into hello.world (cycle,name,sample)
 values (3,three,111.38872920562173);
insert into hello.world (cycle,name,sample)
 values (4,four,91.52878591168258);

Nice, we have something that looks useful already! What you are seeing is the result of executing an activity with the stdout driver. This driver doesn’t speak any wire protocols, but it does know how to print the rendered operations to your console for diagnostics. This is a very common way of getting familiar with data bindings. It’s also a quick way to sanity check what operations your workload would be executing if you were running it against an Apache Cassandra database. To do this, all you would need to do is change driver=stdout to driver=cql and provide a host to connect to.

This shows the power of having a common set of concepts and configuration primitives across testing scenarios. It doesn’t mean that different systems will magically speak each other’s statement forms and protocols, but it does mean that you can express them in an idiomatic way and tailor your test to the target system. It is the job of the NoSQLBench high-level driver, like cql, to adapt the statement templates into a native form for the protocol you are using. Any performance testing tool that doesn’t allow you to control the access patterns and operations in the native language of the target system is just not useful for serious testing.

Each line in the output demonstrates how the cycle number of each operation is specific, and how this is used as the basis for the data that is used in each operation.

If you compare the amount of effort for getting to this point with any other testing tool, you will start to see why we went to the effort to build this toolkit. Keep reading, and the reasons for using NoSQLBench will start to pile up!

Next Level

So how do we take this sketch above and turn it into something that is easy for people to just take and run? To do that, we need to allow them to set up a whole test scenario, including DDL. Thankfully that is easy to do, as no statement form is sacred within NoSQLBench. You simply create the statements to define your keyspace and schema:

Here is our workload config for schema:

# hello-world-ddl.yaml
statements:
  - create-keyspace: |
     create keyspace if not exists hello
     WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
    params:
     prepared: false
  - create-table: |
      create table if not exists hello.world (
        cycle  bigint,
        name   text,
        sample double,
        primary key(cycle)
      );
    params:
     prepared: false

This shows two statements with no bindings (they aren’t needed for the DDL), as well as a new YAML element, called a statement parameter. The prepared: false setting disables the automatic use of prepared statements, since you can’t do this with DDL.

But now we have two blocks of statements, and that is fine as long as we keep them in separate files. However, we can do much better than this. The NoSQLBench YAML format has been refined to support different kinds of test structure, including blocks, tags, and default values. It is still perfectly valid YAML, but NoSQLBench knows how to combine the layers together in a useful way:

# hello-world.yaml
bindings:
 cycle: Identity()
 cyclename: NumberNameToString()
 sample: Normal(100.0D, 10.0D)
 randomish_cycle: HashRangeScaled()

blocks:
 - tags:
    phase: schema
   params:
    prepared: false
   statements:
    - create-keyspace: |
       create keyspace if not exists hello
       WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
    - create-table: |
       create table if not exists hello.world (
        cycle  bigint,
        name   text,
        sample double,
        primary key(cycle)
       );
 - tags:
    phase: main
   statements:
    - insert-sample: |
       insert into hello.world (cycle,name,sample) 
       values ({cycle},{cyclename},{sample});
      ratio: 4
    - read-sample: |
       select * from hello.world where cycle={randomish_cycle}
      ratio: 1

The new elements in this version are as follows:

  • The bindings section isn’t directly attached to the insert-sample statement. Since the names in the insert-sample statement template reference the binding names that are provided in the global document scope, they are used. Any binding not referenced is just there as a recipe. It doesn’t get invoked if it isn’t used.
  • Statement blocks are introduced. This a way of organizing your statements so that you can configure and reference them as a group. All the DDL statements in the first block have a statement parameter of prepared: false
  • Statement tags are used. This allows us to select the active statements as we will demonstrate in the next section. All of the statements defined in the first block will have the block-level configuration, with tag phase: schema. All of the statements (the only one) in the second block will have the tag phase: main
  • A new statement has been added to illustrate reads. The function that feeds the randomish_cycle binding produces a value between 0 and the cycle of the current operation.
  • Some ratios have been added for demonstration purposes. With the above ratios, the insert will be used 4 out of every 5 operations, and the read will be the 5th. The statement selected for each cycle is also deterministic based on the ratios.

Now that we have organized the different kinds of statements into different parts of the testing workflow, we can call on them individually.

NOTE: It is important to observe proper indentation in your YAML files. First time YAML users often struggle with this. Be sure to use spaces, not tabs, and to indent all child elements further than their parents. If you have never used YAML before, it would be a good idea to familiarize yourself with YAML before building more advanced scenarios. We suggest YAML Tutorial: Everything You Need to Get Started in Minutes by Eric Goebelbecker.

Create Schema (with tags)

This section requires you to have a CQL system to connect to. If you don’t already have one, you can start an instance of DSE with this one-liner:

docker run -e DS_LICENSE=accept --name my-dse -p 9042:9042 -d 

datastax/dse-server:6.7.7

Let’s create the schema.

nb run driver=cql yaml=hello-world tags=phase:schema host=host
# output
Logging to logs/scenario_20200205_013213_767.log

Notice that there is no console output. This is because NoSQLBench assumes that you don’t want to be bothered with minutiae unless you ask for it. If you really want to see the details, you can always throw a -v to crank the console logging level up from warning to info, or -vv for debug.

Turn it up

We can now execute some DML operations as well:

nb run driver=cql yaml=hello-world tags=phase:main host=host cycles=1M
# output
01:53:23.719 [scenarios:001] WARN i.e.activityimpl.SimpleActivity - For testing at 
scale, it is highly recommended that you set threads to a value higher than the 
default of 1. hint: you can use threads=auto for reasonable default, or consult the 
topic on threads with `help threads` for more information.
^C (I hit control-C to interrupt it.)

NoSQLBench gently reminds us to turn the threads up. Very well. Let’s also get some grafana goodness into the mix, courtesy of a local docker integration with prometheus, grafana, and graphite exporter.

nb run driver=cql yaml=hello-world tags=phase:main host=host cycles=1B threads=20x 
--docker-metrics
# output
Logging to logs/scenario_20200205_015849_242.log
# every minute you'll see a progress indicator
hello-world: 0.36%/Running (details: min=0 cycle=3589475 max=1000000000)
hello-world: 0.67%/Running (details: min=0 cycle=6734460 max=1000000000)
hello-world: 0.98%/Running (details: min=0 cycle=9777225 max=1000000000)

With the –docker-metrics, before the scenario starts, a docker stack is composed on the local system, and all metrics are pre-wired to go to it automatically. This is a feature that we’re just starting to put the bells and whistles on, but it is very helpful already.

Here’s how it looks:

nosql

NoSQLBench also provides auto-configured Grafana dashboards using Docker.

You can get your results in another format if you need. If you view the output of nb help, you will find various ways to report metrics, including graphite, HDR logs, CSV and more.

Named Scenarios

If you want everyone to be able to run your workload, including schema setup and the main phase with only a single command, you can do that. One newly added feature of NoSQLBench called named scenarios lets you embed your commands into the workload YAML in a form like this:

# add this to hello-world.yaml
scenarios:
  default:
    ddl: run driver=cql tags==phase:schema threads==1 cycles==2
    dml: run driver=cql tags==phase:main threads=auto cycles=1M

With this, you can run a command like this to run the test from beginning to end:

nb hello-world host=host

The net result is that both of the templated commands in the named scenario are used, with any of the options on the command line overriding them where they are not locked with “==”. You can still use this with –docker-metrics and other options, of course. You can also codify different named scenarios with names besides default, and pass them as a second selector in addition to the workload’s yaml file. This is a relatively new feature, but it is already being used to bake a catalog of common workloads into NoSQLBench.

SEE ALSO: Testing Java microservices applications

Jumping In

You can access NoSQLBench documentation at http://docs.nosqlbench.io/.

You can also access the documentation for NoSQLBench as part of the tool itself, with docserver mode. With the docs bundled with the tool, you’ll never wonder if the docs pertain to the version you are using. This mode is also how UI capabilities are expected to land in NoSQLBench.

You can start the docserver mode with a docker command like:

docker run --rm -a STDOUT --net=host --name nb-docs nosqlbench/nosqlbench docserver http://0.0.0.0:12345/
# output
Started documentation server at http://0.0.0.0:12345/
# or
nb docserver
# output
Started documentation server at http://localhost:12345/

You can then browse to it at a routable address on the system you started it on.

Having full control of a NoSQLBench scenario requires some knowledge of the activity parameters like threads. It is highly recommended that new users read about basic activity parameters in the guidebook.

Getting NoSQLBench

You can download the latest release of NoSQLBench at https://github.com/nosqlbench/nosqlbench/releases

The Future

NoSQLBench is always improving. The core machinery of NoSQLBench had to be built first, but the next era of improvement will be focused on empowering how users work with it. We’re looking forward to significant improvements as we go:

  • Improved documentation
  • Improved examples, built in
  • Additional drivers
  • User Interfaces to complement the guidebook

NoSQLBench allows you to see how your database performs with or without your application. It makes it possible to iterate on data models, measure baseline performance, and plan for scale.

We want to empower everyone to use a common set of concepts and tools to make testing easier, no matter what NoSQL database you are testing. With that aim, we want to make the toolset that NoSQLBench is built on the de-facto standard for NoSQL testing in the industry. By releasing NoSQLBench and the CQL driver as open source, we have taken the first step in delivering on that vision.

Happy Testing!

Author

Jonathan Shook

Jonathan Shook works at DataStax on the Vanguard team. He has been building distributed systems for over 15 years, with equal attention to operational and functional design aspects. At DataStax, he focuses on building methods and tools to empower the use of highly available and scalable systems.


Leave a Reply

avatar
400