Tutorial

An introduction to building realtime apps with RethinkDB

Ryan Paul
Database image via Shutterstock

Built for scalability across multiple machines, the JSON document store RethinkDB is a distributed database that uses an easy query language. Here’s how to get started.

RethinkDB is an open source database for building realtime web applications. Instead of polling for changes, the developer can turn a query into a live feed that continuously pushes updates to the application in realtime. RethinkDB’s streaming updates simplify realtime backend architecture, eliminating superfluous plumbing by making change propagation a native part of your application’s persistence layer.

In addition to offering unique features for realtime application development, RethinkDB also benefits from some useful characteristics that contribute to a pleasant developer experience. RethinkDB is a schemaless JSON document store that is designed for scalability and ease of use, with easy sharding, support for distributed joins, and an expressive query language.

This tutorial will demonstrate how to build a realtime web application with RethinkDB and Node.js. It will use Socket.io to convey live updates to the frontend. If you would like to follow along, you can install RethinkDB or run it in the cloud.

First steps with ReQL

The RethinkDB Query Language (ReQL) embeds itself in the programming language that you use to build your application. ReQL is designed as a fluent API, a set of functions that you can chain together to compose queries.

Before we start building an application, let’s take a few minutes to explore the query language. The easiest way to experiment with queries is to use RethinkDB’s administrative console, which typically runs on port 8080. You can type RethinkDB queries into the text field on the Data Explorer tab and run them to see the output. The Data Explorer provides auto-completion and syntax highlighting, which can be helpful while learning ReQL.

By default, RethinkDB creates a database named test. Let’s start by adding a table to the testdatabase:

r.db("test").tableCreate("fellowship")

Now, let’s add a set of nine JSON documents to the table:

r.table("fellowship").insert([
  { name: "Frodo", species: "hobbit" },
  { name: "Sam", species: "hobbit" },
  { name: "Merry", species: "hobbit" },
  { name: "Pippin", species: "hobbit" },
  { name: "Gandalf", species: "istar" },
  { name: "Legolas", species: "elf" },
  { name: "Gimili", species: "dwarf" },
  { name: "Aragorn", species: "human" },
  { name: "Boromir", species: "human" }
])

When you run the command above, the database will output an array with the primary keys that it generated for all of the new documents. It will also tell you how many new records it successfully inserted. Now that we have some records in the database, let’s try using ReQL’s filter command to fetch the fellowship’s hobbits:

r.table("fellowship").filter({species:"hobbit"})

The filter command retrieves the documents that match the provided boolean expression. In this case, we specifically want documents in which the species property is equal to hobbit. You can chain additional commands to the query if you want to perform more operations. For example, you can use the following query to change the value of the species property for all hobbits:

r.table("fellowship").filter({species: "hobbit"})
                     .update({species: "halfling"})

ReQL even has a built-in http command that you can use to fetch data from public web APIs. In the following example, we use the http command to fetch the current posts from a popular subreddit. The full query retrieves the posts, orders them by score, and then displays several properties from the top five entries:

r.http("http://www.reddit.com/r/aww.json")("data")("children")("data")
 .orderBy(r.desc("score")).limit(5).pluck("score", "title", "url")

As you can see, ReQL is very useful for many kinds of ad hoc data analysis. You can use it to slice and dice complex JSON data structures in a number of interesting ways. If you’d like to learn more about ReQL, you can refer to the API reference documentation, the ReQL introduction on the RethinkDB website, or the RethinkDB cookbook.

Use RethinkDB in Node.js and Express

Now that you’re armed with a basic working knowledge of ReQL, it’s time to start building an application. We’re going to start by looking at how you can use Node.js and Express to make an API backend that serves the output of a ReQL query to your end user.

The rethinkdb module in npm provides RethinkDB’s official JavaScript client driver. You can use it in a Node.js application to compose and send queries. The following example shows how to perform a simple query and display the output:

var r = require("rethinkdb");

r.connect().then(function(conn) {
  return r.db("test").table("fellowship")
          .filter({species: "halfling"}).run(conn)
    .finally(function() { conn.close(); });
})
.then(function(cursor) {
  return cursor.toArray();
})
.then(function(output) {
  console.log("Query output:", output);
})
.error(function(err) {
  console.log("Failed:", err);
});

The connect method establishes a connection to RethinkDB. It returns a connection handle, which you provide to the run command when you want to execute a query. The example above finds all of the halflings in the fellowship table and then displays their respective JSON documents in your console. It uses promises to handle the asynchronous flow of execution and to ensure that the connection is properly closed when the operation completes.

Let’s expand on the example above, adding an Express server with an API endpoint that lets the user fetch all of the fellowship members of the desired species:

var app = require("express")();
var r = require("rethinkdb");

app.listen(8090);
console.log("App listening on port 8090");

app.get("/fellowship/species/:species", function(req, res) {
  r.connect().then(function(conn) {
    return r.db("test").table("fellowship")
            .filter({species: req.params.species}).run(conn)
        .finally(function() { conn.close(); });
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) { res.status(500).json({err: err}); })
});

If you have previously worked with Express, the code above should look fairly intuitive. The final path segment in the URL route represents a variable, which we pass to the filter command in the ReQL query in order to obtain just the desired documents. After the query completes, the application relays the JSON output to the user. If the query fails to complete, then the application will return status code 500 and provide the error.

Realtime updates with changefeeds

RethinkDB is designed for building realtime applications. You can get a live stream of continuous query updates by appending the changes command to the end of a ReQL query. The changescommand creates a changefeed, which will give you a cursor that receives new records when the results of the query change. The following code demonstrates how to use a changefeed to display table updates:

r.connect().then(function(c) {
  return r.db("test").table("fellowship").changes().run(c);
})
.then(function(cursor) {
  cursor.each(function(err, item) {
    console.log(item);
  });
});

The cursor.each callback executes every time the data within the fellowship table changes. You can test it for yourself by making an arbitrary change. For example, we can remove Boromir from the fellowship after he is slain by orcs:

r.table("fellowship").filter({name:"Boromir"}).delete()

When the query removes Boromir from the fellowship, the demo application will display the following JSON data in stdout:

{
  new_val: null,
  old_val: {
    id: '362ae837-2e29-4695-adef-4fa415138f90',
    name: 'Boromir',
    species: 'human'
  }
}

When changefeeds provide update notifications, they tell you the previous value of the record and the new value of the record. You can compare the two in order to see what has changed. When existing records are deleted, the new value is null. Similarly, the old value is null when the table receives new records.

The changes command currently works with the following kinds of queries: get, between,filter, map, orderBy, min, and max. Support for additional kinds of queries, such as groupoperations, is planned for the future.

A realtime scoreboard

Let’s consider a more sophisticated example: a multiplayer game with a leaderboard. You want to display the top five users with the highest scores and update the list in realtime as it changes. RethinkDB changefeeds make that easy. You can attach a changefeed to a query that includes theorderBy and limit commands. Whenever the scores or overall composition of the list of top five users changes, the changefeed will give you an update.

Before we get into how you set up the changefeed, let’s start by using the Data Explorer to create a new table and populate it with some sample data:

r.db("test").tableCreate("players")
r.table("players").indexCreate("score")
r.table("players").insert([
  {name: "Bill", score: 33},
  {name: "Janet", score: 42},
  {name: "Steve", score: 68}
  ...
])

Creating an index helps the database sort more efficiently on the specified property–which is scorein this case. At the present time, you can only use the orderBy command with changefeeds if you order on an index.

To retrieve the current top five players and their scores, you can use the following ReQL expression:

r.db("test").table("scores").orderBy({index: r.desc("score")}).limit(3)

We can add the changes command to the end to get a stream of updates. To get those updates to the frontend, we will use Socket.io, a framework for implementing realtime messaging between server and client. It supports a number of transport methods, including WebSockets. The specifics of Socket.io usage are beyond the scope of this article, but you can learn more about it by visiting the official Socket.io documentation.

The following code uses sockets.emit to broadcast the updates from a changefeed to all connected Socket.io clients:

var sockio = require("socket.io");
var app = require("express")();
var r = require("rethinkdb");

var io = sockio.listen(app.listen(8090), {log: false});
console.log("App listening on port 8090");

r.connect().then(function(conn) {
  return r.table("scores").orderBy({index: r.desc("score")})
    .limit(5).changes().run(conn);
})
.then(function(cursor) {
  cursor.each(function(err, data) {
    io.sockets.emit("update", data);
  });
});

On the frontend, you can use the Socket.io client library to set up a handler that receives the updateevent:

var socket = io.connect();

socket.on("update", function(data) {
  console.log("Update:", data);
});

That’s a good start, but we need a way to populate the initial list values when the user first loads the page. To that end, let’s extend the server so that it broadcasts the current leaderboard over Socket.io when a user first connects:

var getLeaders = r.table("scores").orderBy({index: r.desc("score")}).limit(5);

r.connect().then(function(conn) {
  return getLeaders.changes().run(conn);
})
.then(function(cursor) {
  cursor.each(function(err, data) {
    io.sockets.emit("update", data);
  });
});

io.on("connection", function(socket) {
  r.connect().then(function(conn) {
    return getLeaders.run(conn)
      .finally(function() { conn.close(); });
  })
  .then(function(output) { socket.emit("leaders", output); });
});

The application uses the same underlying ReQL expression in both cases, so we can store it in a variable for easy reuse. ReQL’s method chaining makes it highly conducive to that kind of composability.

To wrap up the demo, let’s build a complete frontend. To keep things simple, I’m going to use Polymer’s data binding system. Let’s start by defining the template:

<template id="scores" is="auto-binding">
  <ul>
    <template repeat="{{user in users}}">
      <li><strong>{{user.name}}:</strong> {{user.score}}</li>
    </template>
  </ul>
</template>

It uses the repeat attribute to insert one li tag for each user. The contents of the li tag display the user’s name and their current score. Next, let’s write the JavaScript code:

var scores = document.querySelector("#scores");
var socket = io.connect();

socket.on("leaders", function(data) {
  scores.users = data;
});

socket.on("update", function(data) {
  for (var i in scores.users)
    if (scores.users[i].id === data.old_val.id) {
      scores.users[i] = data.new_val;
      scores.users.sort(function(x,y) { return y.score - x.score });
      break;
    }
});

The handler for the leaders event simply takes the data from the server and assigns it to the template variable that stores the users. The update handler is a bit more complex. It finds the entry in the leaderboard that correlates with the old_val and then it replaces it with the new data.

When the score changes for a user that is already in the leaderboard, it’s just going to replace the old record with a new one that has the updated number. In cases where a user in the leaderboard is displaced by one who wasn’t there previously, it will replace one user’s record with that of another. The code above will properly handle both cases.

Of course, the changefeed updates don’t help us maintain the actual order of the users. To remedy that problem, we simply sort the user array after every update. Polymer’s data binding system will ensure that the actual DOM representation always reflects the desired order.

Now that the demo application is complete, you can test it by running queries that change the scores of your users. In the Data Explorer, you can try running something like:

r.table("scores").filter({name: "Bill"})
 .update({score: r.row("score").add(100)})

When you change the value of the user’s score, you will see the leaderboard update to reflect the changes.

Next steps

Conventional databases are largely designed around a query/response workflow that maps well to the web’s traditional request/response model. But modern technologies like WebSockets make it possible to build applications that stream updates in realtime, without the latency or overhead of HTTP requests.

RethinkDB is the first open source database that is designed specifically for the realtime web. Changefeeds offer a way to build queries that continuously push out live updates, obviating the need for routine polling.

To learn more about RethinkDB, check out the official documentation. The introductory ten-minute guide is a good place to start. You can also check out some RethinkDB demo applications, which are published with complete source code.

Author
Ryan Paul
Ryan Paul is a developer evangelist at RethinkDB. He is also a Linux enthusiast and open source software developer. He was previously a contributing editor at Ars Technica, where he wrote articles about software development.

Comments
comments powered by Disqus