Users can rely on the processing power of more Terrastore servers.
JAXenter speaks to Terrastore founder, Sergio Bossa, on the 0.8.0 release.
Sergio Bossa is a pragmatic programmer with a true passion for open source software and open communities. As a long time open source contributor, he worked as a committer for Spring Framework Modules and Terracotta Forge, led the development of the open source Atlassian Jira clustered version and is now the founder of the Terrastore distributed document store. He currently works as a software engineer in the online gambling and casinos industry for Gioco Digitale, where he’s focusing on building robust and scalable software for the company backend platform.
Terrastore 0.8.0 has just been released. In this interview JAXenter catches up with Terrastore founder Sergio Bossa, to find out what’s new in this release…….
JAXenter: Terrastore 0.8.0 has just been announced. What benefits does the new map/reduce processing functionality, bring to the product?
Sergio Bossa: The new map/reduce processing will enable users to leverage Terrastore distributed architecture to perform complex aggregations and queries over stored data.
Previously – as with any other product with no distributed map/reduce or just scatter/gather capability – the only way users could perform an aggregation over stored data was to retrieve it by the client side and make the complex processing locally, which obviously doesn’t scale when data size grows.
With the introduction of map/reduce, users can rely on the processing power of more Terrastore servers to perform the aggregation in a parallel manner, taking less time and consuming almost no memory and CPU by the client side.
So let’s say we have stored documents with data about – among other things – people age, and you want to find the median age: with map/reduce, it’s just a matter of writing a mapper function for extracting the desired piece of data, the age indeed, and a reduce function to compute the median value; Terrastore will take care of running the parallel computation and returning you the desired result.
JAXenter: How has the events management infrastructure been enhanced in 0.8.0?
Sergio Bossa: First, event listeners have now access to both the old and new version of the changed document, and the old version in case of a removal: this is very useful to perform actions depending on what actually changed in stored data. And, you can now also perform actions that modify stored data, what I refer to as “active listeners”: many use cases now materialize, mainly centered on the possibility to update dependent documents and/or create processing chains which elaborate and store intermediate document versions up to the final, desired one, with everything happening inside the store with no user intervention.
JAXenter: What is the ‘Adaptive Ensemble Scheduling’ that comes as part of the recent release?
Sergio Bossa: This question needs a little background. First, the ensemble is a Terrastore deployment mode to provide horizontal scalability by joining together several clusters, making them work as a whole – so that users can transparently access whatever node in whatever cluster, and take advantage of the whole storage and processing capabilities.
All clusters in the ensemble get access each other by exchanging “cluster views”: the “ensemble scheduler” activates such a view-exchange process. The new “Adaptive Ensemble Scheduling” mechanism implements a more efficient, more reliable, algorithm to exchange views and so keep all clusters in the ensemble up-to-date: it’s based on a dynamic algorithm which computes the optimal frequency of the exchanges by taking into account previous data such as number of joining/leaving nodes and their frequency, rather than absolute, fixed data.
JAXenter: What’s planned for the 0.9.0 release?
Sergio Bossa: Prior to 0.9.0, we’ll probably go through a few 0.8.x releases, providing bug fixes and minor enhancements and features. Then, 0.9.0 will probably focus on enhancing Terrastore ensemble functionalities, in particular regarding data replication, and optimizing some Terrastore performance aspects, in particular regarding memory consumption … that’s unless users will demand for something different and more important!