“Rockset is on a mission to deliver fast and flexible real-time analytics”
What’s new at Rockset – the real-time analytics database that serves low latency applications? Venkat Venkataramani, Co-Founder and CEO at Rockset, spoke with us about how Rockset helps achieve real-time analytics, what data sources it supports, what’s under the hood, and more.
JAXenter: Thank you for taking the time to speak with us! Can you tell us more about Rockset and how it works? How does it help us achieve real-time analytics?
Venkat Venkataramani: Rockset is a real-time analytics database that serves low latency applications. Think real-time logistics tracking, personalized experiences, anomaly detection and more.Wh
Rockset employs the same indexing approach used by the systems behind the Facebook News Feed and Google Search, which were built to make data retrieval for millions of users and on TBs of data, instantaneous. It goes a step further by building a Converged Index ― a search index, a columnar store and a row index on all data. This means sub-second search, aggregations and joins without any performance engineering.
You can point Rockset at any data ― structured, semi-structured and time series data ― and it will index the data in real-time and enable fast SQL analytics. This frees teams from time-consuming and inflexible data preparation. Teams can now onboard new datasets and run new experiments without being constrained by data operations. And, Rockset is fully-managed and cloud-native, making a massively distributed real-time data platform accessible to all.
JAXenter: What data sources does it currently support?
Venkat Venkataramani: Rockset has built-in data connectors to data streams, OLTP databases and data lakes. These connectors are all fully-managed and stay in sync with the latest data. That means you can run millisecond-latency SQL queries within 2 seconds of data being generated. Rockset has built-in connectors to Amazon DynamoDB, MongoDB, Apache Kafka, Amazon Kinesis, PostgreSQL, MySQL, Amazon S3 and Google Cloud Storage. Rockset also has a Write API to ingest and index data from other sources.
JAXenter: What’s new at Rockset and how will it continue to improve analytics for streaming data?
Venkat Venkataramani: We recently announced a series of product releases to make real-time analytics on streaming data affordable and accessible. With this launch, teams can use SQL to transform and pre-aggregate data in real-time from Apache Kafka, Amazon Kinesis and more.
This makes real-time analytics up to 100X more cost-effective on streaming data. And, we free engineering teams from needing to construct and manage complex data pipelines to onboard new streaming data and experiment on queries. Here’s what we’ve released:
- Continuously transform data during ingestion: Customers can use SQL to transform streaming data as it is ingested, eliminating time and effort required to maintain complex real-time data pipelines.
- Rollup data during ingestion: Customers can use SQL to pre-aggregate streaming data as it is ingested, reducing the cost of storing and querying data by 10-100x.
- Set time-based partitioning and retention: Customers can set highly efficient data retention policies for time series and streaming data, enabling automatic deletion of aging data for reducing costs.
You can delve further into this release by watching a live Q&A with Tudor Bosman, Rockset’s Chief Architect. He delves into how we support complex aggregations on rolled up data and ensure accuracy even in the face of dupes and latecomers.
JAXenter: What are some common use cases for real-time data analytics? When is it useful to implement?
Venkat Venkataramani: You experience real-time analytics every day whether you realize it or not. The content displayed in Instagram newsfeeds, the personalized recommendations on Amazon and the promotional offers from Uber Eats are all examples of real-time analytics. Real-time analytics encourages users to take desired actions from reading more content, to adding items to our cart, to using takeout and delivery services for more of our meals.
We think real-time analytics isn’t just useful to the big tech giants. It’s useful across all technology companies to drive faster time to insight and build engaging experiences. We’re seeing SaaS companies in the logistics space provide real-time visibility into the end-to-end supply chain, route shipments and predict ETAs. This ensures that materials arrive on time and within schedule, even in the face of an increasingly complex chain. Or, there are marketing analytics software companies that need to unify data across a number of interaction points to create a single view of the customer. This view is then used for segmentation, personalization and automation of different actions to create more compelling customer experiences.
There’s a big misperception in the space that a) real-time analytics is too expensive b) real-time analytics is only accessible to large tech companies. That’s just not true anymore. The cloud offerings, availability of real-time data and the changing resource economics are making this within reach of any digital disrupter.
JAXenter: How is Rockset built under the hood?
Venkat Venkataramani: The Converged Index, mentioned previously, is the key component in enabling real-time analytics. Rockset stores all its data in the search, column-based and row-based index structures that are part of the Converged Index, and so we have to ensure that the underlying storage can handle both reads and writes efficiently. To meet this requirement, Rockset uses RocksDB as its embedded storage engine, with some modifications for use in the cloud. RocksDB enables Rockset to handle high write rates, leverage SSDs for optimal price-performance and support updates to any field.
Another core part of Rockset’s design is its use of a disaggregated architecture to maximize resource efficiency. We use an Aggregator-Leaf-Tailer (ALT) architecture, common at companies like Facebook and LinkedIn, where resources for ingest compute, query compute and storage can be scaled independently of each other based on the workload in the system. This allows Rockset users to exploit cloud efficiencies to the full.
JAXenter: Personally, what are some of your favorite open source tools that you can’t do without?
Venkat Venkataramani: RocksDB! The team at Rockset built and open-sourced RocksDB at Facebook, a high performance embedded storage engine used by other modern data stores like CockroachDB, Kafka and Flink. RocksDB was a project at Facebook that abstracted access to local stable storage so that developers could focus their energies on building out other aspects of the system. RocksDB has been used at Facebook as the embedded storage for spam detection, graph search and message queuing. At Rockset, we’ve continued to contribute to the project as well as release RocksDB-cloud to the community.
We are also fans of the dbt community, an open-source tool that lets data teams collaborate on transforming data in their database to ship higher quality data sets, faster. We share a similar outlook on the data space – we think data pipelines are challenging to build and maintain, respect SQL as the lingua franca of analytics and want to make it easy for data to be shared across an organization.
JAXenter: Can you share anything about Rockset’s future? What’s on the roadmap next, what features and/or improvements are being worked on?
Venkat Venkataramani: Rockset is on a mission to deliver fast and flexible real-time analytics, without the cost and complexity. Our product roadmap is geared towards enabling all digital disrupters to realize real-time analytics.
This requires taking steps to make real-time analytics more affordable and accessible than ever before. A first step towards affordability was the release of SQL-based rollups and transformations, which cut the cost of real-time analytics up to 100X for streaming data. As part of our expansion initiative, we’re also expanding Rockset to users across the globe. Follow us as we continue to put real-time analytics within reach of all engineers.