Harnessing Big Data

The 3 approaches to Big Data Analytics – chat with Jaspersoft’s Karl Van den Bergh

Chris Mayer

The explosion of Big Data has caught many off-guard and some enterprises are unaware how to analyse the reams of data at their disposal. We talk to Jaspersoft for some handy tips and about the Big Data Index

JAX: So firstly, why did Jaspersoft focus on Big

Van den Bergh: Jaspersoft focused on Big Data because we saw
this as a direction that our customers and community were
gravitating towards. Our community of over a quarter million BI
Builders is of course very open-source focused. There is naturally
a big intersection with the community driving Big Data which is
also largely comprised of open source developers.

We see jumping into Big Data as a big opportunity for Jaspersoft
because customers have a growing need to analyze the growing
volume, variety and velocity and data. Our offering
is broadening the impact and insight you can drive from this
data – both structured and non-structured — by making Jaspersoft
BI available on big data platforms.

What sort of areas does your Business Intelligence Software

Our business intelligence suite enables better decision-making
through highly interactive web-based reports, dashboards and
analysis. Leveraging a commercial open source business model, the
Jaspersoft BI suite includes pixel-perfect enterprise reporting, ad
hoc query, dashboards, OLAP and in-memory analysis, and data

What was the thinking behind creating the Jaspersoft Big
Data Index (JBDI) and can you explain how it works?

We created the Big Data Index because we felt that the data we
had was valuable to share with others outside of our organization.
Everyone is trying to learn about what is going on in the Big Data
world and our data provides some interesting insights.

The Big Data Index is a ranking of connector downloads from
our open source Forge of leading Big Data stores including Hadoop,
MongoDB, Cassandra, Couchbase, Riak and others. By capturing
download data from January 2011 to present, we can verify growth
trends for Big Data analytics overall and rank demand by individual
data store.

What sort of trends have you noted from the

Some of our key findings in the 2012 Jaspersoft Big Data

  • Over 15,000 Big Data connectors were downloaded in 2011;
  • Demand for MongoDB, the document-oriented NoSQL database, saw
    the biggest spike with over 200 percent growth in 2011;
  • Hadoop Hive, the SQL interface to Hadoop MapReduce, represented
    60 percent of all Hadoop-based connectors;
  • Hadoop HBase, the distributed columnar database that runs on
    HDFS, was the second most popular Hadoop-based connector;
  • Cassandra, the high availability NoSQL database, was among the
    top four most downloaded Big Data sources in 2011; and
  • Over 27 percent of Big Data connector downloads were for Riak,
    Infinispan, Neo4J, Redis, CouchDB, VoltDB or others.

Do you feel that a lot of enterprises are being
overwhelmed by choice in the Big Data market, and don’t know which
solution to pick? 

There is a lot of confusion in the market because there is such
an array of choices folks have when it comes to Big Data. Each
store has its pros and cons and depending on what you want to
accomplish. For example, if you want to power a high-transaction
web application, MongoDB could be a good choice; whereas if you
want to collect and analyze huge volumes of log files, Hadoop Hive
is a great framework to consider.

What considerations should businesses make when making a
choice for Big Data and also Big Data analytics?

There is safety in numbers; having a large community that
supports the data store that you’ve chosen so going with an open
source option makes a lot of sense. It’s also helpful to have a
commercial vendor in place that can provide support for mission
critical deployments. 

Are there different considerations that enterprises should
make when approaching Big Data reporting and analysis? If so, what
are they?

We see three primary architectures for Big Data analytics:

Interactive Exploration: This architecture, which
requires a native connector as well as in-memory capabilities,
provides a low latency interface for data analysts and data
scientists who want to discover real-time patterns as they emerge
from their Big Data content. This type of architecture works with
those Big Data stores that provide a low-latency interface like
Hadoop HBase, or MongoDB

Direct Batch Reporting: This architecture, which
can work with a native or SQL connector, provides a medium latency
interface for executives and operational managers who want
summarized, pre-built daily reports on Big Data content. This type
of architecture works with those Big Data stores that provide a SQL
interface like Hadoop Hive or Cassandra CQL.

Indirect Batch Analysis: This architecture,
which incorporates an ETL engine and a relational data mart or data
warehouse, is great for data analysts and operational managers who
want to analyze historical trends based upon pre-defined questions
in their Big Data content. This architecture is pretty open in
terms of connector type and works for any Big Data sources.

By releasing a stable version. Hadoop seems to have become
very popular recently. Why do you think this is?

Hadoop is popular because it’s a cheap and easy dumping ground
for large volumes of unstructured or semi-structured content like
log files that can then be processed later. Unlike with a
traditional data warehouse, you don’t have to do any work to
structure this data before you dump, so getting started is pretty
easy. And of course, the parallel processing MapReduce framework is
ideal for handling massive volumes of data.

Any Big Data solution that you feel could join the
mainstream for the Enterprise? Any to look out for?

Our data suggest that Hadoop, Mongo and Cassandra are the
leading candidates and that seems to be consistent with other data
out there..

Any future plans for Jaspersoft when it comes to the world
of Big Data?

Look for more connectors with even more sophistication that get
built into the core of our products. Thanks to the depth and
breadth of our offering, our goal is to become the preferred BI
provider for Big Data, both in the open source developer world as
well as for the enterprise.

comments powered by Disqus