Flux & Time to Awesome – InfluxData interview
I was lucky enough to sit down and have a chat with Tim Hall, Head of Products at InfluxData about their platform, their ‘time to awesome’ philosophy, and a new data scripting and querying language, Flux. We also take a look at the future of InfluxData and their plans for 2020. Let’s take a closer look.
One dark evening last year, I sat down for a chat with Tim Hall, Head of Products at InfluxData. Over in the US, his day was just getting started. Over the next half an hour we talked a lot about InfluxData and their products, as well as their open source initiatives and Flux, a new data scripting and querying language they created. To start with though, he gave me a quick overview of the company’s history.
Background of InfluxData
The platform was born of a desire to stop using solutions to try and bridge the gap and instead have something purpose-built to handle time series data. He kindly explained to me what time series data is:
InfluxData is focused on building a platform to deal with time series data, which you can think of as metrics and events – anything with a timestamp.
At its base, InfluxDB as an open source platform runs on a single node or machine, but there are two additional paid offerings: Enterprise and Cloud. The Enterprise version scales up over multiple nodes as well as offering additional security features and more operational tooling. And there’s the Cloud version, which is actually one of the reasons why we sat down for a chat in the first place: InfluxDB Cloud is now a serverless platform that works as pay as you go.
Open source in Europe
The other reason we were talking is they recently opened a new office in London. As we talked further Tim explained that Germany is the country where the open source base is growing fastest in Europe. He went on to describe how Europe has slightly over 50% of InfluxData’s open source users worldwide, so I asked him why he thinks that’s the case:
So I think there’s a large growth in IoT, the internet of things, and the delivery and deployment of sensors across a wide range of industries in Europe. And I think that’s one of the key things that’s driving the use of our platform. Influx is a platform for dealing with, with these metrics and events, and anything that’s a sensor essentially is a time series use case.
All of those sensors that exist essentially throw off this type of data. The way we think about it is there’s sort of two modes. There’s the instrumentation of the physical world, which is going on and Europe seems to be leading pretty far in front in many cases; whether it’s manufacturing, smart cities, or even agricultural businesses that are using this technology.
So whether we’re talking about an agricultural use case monitoring moisture levels in the soil, or tracking a video game engine’s telemetry to see where there’s still room for improvement, the ways InfluxDB is used can be hugely different. That said, they have some more common use cases, and these help them get new users set up fast.
Time to awesome
InfluxData have an amazing way of thinking that they call their time to awesome philosophy. Tim described it as follows, emphasizing that the goal is to get new users problem solving as fast as possible:
InfluxDB Cloud provides you with the fastest time to results. You can get up and running in minutes and easily scale your project as needed. It’s purpose-built and optimized for time to awesome. Powerful wizards walk you through setup and you’re connected with recommended dashboards for the most common use cases.
After we talked about the diversity of InfluxDB, I asked what the challenges are when building something with such a wide range of possible applications. Tim said, “the selling of it tends to be harder than the building of it, to be quite honest,” and actually the challenge seems to come more in communicating the potential of their platform. He explained, “sometimes it’s about painting a picture that people can see themselves in as opposed to showing up with a prepackaged solution saying we do this and we do only this.”
In essence, the challenges come from being able to deal with the sheer volume of data arriving at a potentially very high frequency. Tim gave the example of a stock exchange that might use InfluxDB to monitor the latency of the time drift of the servers down to the nanosecond, because the servers performing trades can’t drift by a hundred nanoseconds. Plus, for compliance reasons, this data has to be kept for 10 years. So the hardest part about putting a platform like this together is making sure that landing, storing and compressing data, and making it available whenever it’s needed but also being able to delete it once it’s no longer required, is scalable to the end user’s needs, no matter what those demands might be.
To ensure the project went as smoothly as possible, InfluxData actually created a new data scripting and query language called Flux. I asked why they wanted to create something new:
Obviously SQL is the lingua franca for most people working with data in general – it’s been around forever, right? It has a definite usefulness in terms of its familiarity for creating reports and extracting data. The challenge with it though is it’s very precise.
So there’s a precision to SQL in terms of your desire and working with the data because you want a specific record, right? So let’s say I’m going to talk with you – I need to know your specific phone number, your specific address, your specific post code. These things are important. But when you’re working with time series data, you don’t necessarily care about the specific point in time record; What you care about is an aggregation of data over time and looking at the change of that over time and so that that simply requires a different way of working with data.
In terms of working with the data, you typically want to look at a specific time range and then apply a certain mathematical formula or filter on top of that. And then I want to present the series that fall out of that. And, and so that sort of style and language is just super different from the way that SQL works. So we sort of took a step back and said, you know what? What are the kinds of goals and motivations in terms of working with time series data that we’d like to aim for?
And from that, Flux was born.
So following InfluxData’s time to awesome principle, where previously a developer would have to take time series data and metadata from two different sources and then write code to combine them to produce something humans can read, now using Flux developers can include “from and to sources”.
Flux will allow for the inclusion of what we call from and to sources. And while the native one within the languages is Influx, you can do an SQL from and to, you can do an MQTT from and to, and now you can do Bigtable, which is a technology from Google. And the idea here is to show enough examples to sort of ignite the community to then come and work with us and build from and to functions that effectively allow the developer to work with those data sources in the languages that they’re familiar with.
So if the thing supports SQL, you write your SQL statement in Flux and you ask Flux to send it to that source and it will bring back a columnar set of data that then you can join with your time series data. All within the query engine. And so again, what that does is that pushes the technical complexity, you know, down from code into a single query that can span multiple data sources.
It’s super powerful concept. And they can continue to use all their normal tools. So it’s really, really powerful. It’s also an open and extensible approach, not only from the development perspective, but from the community perspective – and I think that’s one of the powerful things about open source, right? Getting our community contribution.
How was Flux received by the community?
It seems like InfluxData were pretty nervous to start with, but they got a lot of positive and constructive feedback leading to some pretty fast quality of life updates such as code suggestions in VS Code, but also some discoveries that their tech was being used in even more places than they originally realized. They’ve known from the start that InfluxData isn’t an island, but now they’ve started building bridges to islands they didn’t know about until specific use cases were brought to their attention by the community.
What’s next for InfluxData?
Finally, Tim explained that their next big milestone is becoming multicloud; right now they only use AWS, but last year they announced they will start offering Google cloud platform. So the priority in the early part of 2020 is to make good on that promise. But they’re also aware that many of their European customers want InfluxData’s platform available on Azure, so they are aiming to also launch Azure support in Europe first (which makes a change from the US getting everything first).