Harnessing Big Data
The 3 approaches to Big Data Analytics - chat with Jaspersoft's Karl Van den Bergh
JAX: So firstly, why did Jaspersoft focus on Big Data?
Van den Bergh: Jaspersoft focused on Big Data because we saw
this as a direction that our customers and community were
gravitating towards. Our community of over a quarter million BI
Builders is of course very open-source focused. There is naturally
a big intersection with the community driving Big Data which is
also largely comprised of open source developers.
We see jumping into Big Data as a big opportunity for Jaspersoft because customers have a growing need to analyze the growing volume, variety and velocity and data. Our offering is broadening the impact and insight you can drive from this data – both structured and non-structured -- by making Jaspersoft BI available on big data platforms.
What sort of areas does your Business Intelligence Software cover?
Our business intelligence suite enables better decision-making
through highly interactive web-based reports, dashboards and
analysis. Leveraging a commercial open source business model, the
Jaspersoft BI suite includes pixel-perfect enterprise reporting, ad
hoc query, dashboards, OLAP and in-memory analysis, and data
What was the thinking behind creating the Jaspersoft Big Data Index (JBDI) and can you explain how it works?
We created the Big Data Index because we felt that the data we
had was valuable to share with others outside of our organization.
Everyone is trying to learn about what is going on in the Big Data
world and our data provides some interesting insights.
The Big Data Index is a ranking of connector downloads from our open source Forge of leading Big Data stores including Hadoop, MongoDB, Cassandra, Couchbase, Riak and others. By capturing download data from January 2011 to present, we can verify growth trends for Big Data analytics overall and rank demand by individual data store.
What sort of trends have you noted from the JBDI?
Some of our key findings in the 2012 Jaspersoft Big Data Index were:
- Over 15,000 Big Data connectors were downloaded in 2011;
- Demand for MongoDB, the document-oriented NoSQL database, saw the biggest spike with over 200 percent growth in 2011;
- Hadoop Hive, the SQL interface to Hadoop MapReduce, represented 60 percent of all Hadoop-based connectors;
- Hadoop HBase, the distributed columnar database that runs on HDFS, was the second most popular Hadoop-based connector;
- Cassandra, the high availability NoSQL database, was among the top four most downloaded Big Data sources in 2011; and
- Over 27 percent of Big Data connector downloads were for Riak, Infinispan, Neo4J, Redis, CouchDB, VoltDB or others.
Do you feel that a lot of enterprises are being overwhelmed by choice in the Big Data market, and don't know which solution to pick?
There is a lot of confusion in the market because there is such
an array of choices folks have when it comes to Big Data. Each
store has its pros and cons and depending on what you want to
accomplish. For example, if you want to power a high-transaction
web application, MongoDB could be a good choice; whereas if you
want to collect and analyze huge volumes of log files, Hadoop Hive
is a great framework to consider.
What considerations should businesses make when making a choice for Big Data and also Big Data analytics?
There is safety in numbers; having a large community that
supports the data store that you've chosen so going with an open
source option makes a lot of sense. It's also helpful to have a
commercial vendor in place that can provide support for mission
Are there different considerations that enterprises should make when approaching Big Data reporting and analysis? If so, what are they?
We see three primary architectures for Big Data analytics:
Interactive Exploration: This architecture, which requires a native connector as well as in-memory capabilities, provides a low latency interface for data analysts and data scientists who want to discover real-time patterns as they emerge from their Big Data content. This type of architecture works with those Big Data stores that provide a low-latency interface like Hadoop HBase, or MongoDB
Direct Batch Reporting: This architecture, which can work with a native or SQL connector, provides a medium latency interface for executives and operational managers who want summarized, pre-built daily reports on Big Data content. This type of architecture works with those Big Data stores that provide a SQL interface like Hadoop Hive or Cassandra CQL.
Indirect Batch Analysis: This architecture, which incorporates an ETL engine and a relational data mart or data warehouse, is great for data analysts and operational managers who want to analyze historical trends based upon pre-defined questions in their Big Data content. This architecture is pretty open in terms of connector type and works for any Big Data sources.
By releasing a stable version. Hadoop seems to have become very popular recently. Why do you think this is?
Hadoop is popular because it’s a cheap and easy dumping ground
for large volumes of unstructured or semi-structured content like
log files that can then be processed later. Unlike with a
traditional data warehouse, you don't have to do any work to
structure this data before you dump, so getting started is pretty
easy. And of course, the parallel processing MapReduce framework is
ideal for handling massive volumes of data.
Any Big Data solution that you feel could join the mainstream for the Enterprise? Any to look out for?
Our data suggest that Hadoop, Mongo and Cassandra are the
leading candidates and that seems to be consistent with other data
Any future plans for Jaspersoft when it comes to the world of Big Data?
Look for more connectors with even more sophistication that get built into the core of our products. Thanks to the depth and breadth of our offering, our goal is to become the preferred BI provider for Big Data, both in the open source developer world as well as for the enterprise.