Big Data is boiling away

Pentaho’s new tactic – open Kettle

Chris Mayer

Pentaho wants to be part of the Big Data revolution and puts it chips down – what better way to start than opening up Kettle to the community to gain mass adoption

Pentaho Corporation has announced that it has open sourced all
of its big data capabilities in the new
Kettle 4.3 release, and has moved the entire Pentaho
Kettle project to the Apache License, Version 2.0.

This expertly timed move from Pentaho seems to be an attempt
to further accelerate the rapid adoption
Kettle for Big Data
 by developers, analysts and data
scientists as the go-to tool for operationalising big data. With an
Apache License, the link-up with the Hadoop community seems
inevitable and would organically grow Kettle’s

Pentaho Kettle is an ETL engine (Extraction, Transformation and
Loading) and can execute ETL transforms outside the Hadoop cluster
or within the nodes of the cluster, taking advantage of Hadoop’s
distributed processing and reliability.

Kettle is also the name of the wider project, in which the ETL
engine forms the core component. Big data capabilities available
under open source Pentaho Kettle 4.3 include the ability to input,
output, manipulate and report on data using the following Hadoop
and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt,
HBase, Hive, HPCC Systems and MongoDB.

With regard to Hadoop, Pentaho Kettle makes available job
orchestration steps for Hadoop, Amazon Elastic MapReduce, Pentaho
MapReduce, HDFS File Operations, and Pig scripts. All major Hadoop
distributions are supported including: Amazon Elastic MapReduce,
Apache Hadoop, Cloudera’s Distribution including Apache Hadoop
(CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data
Platform powered by Apache Hadoop, and MapR’s M3 Free and M5

The benefits to developers from this open sourcing of Kettle
appear to be huge. They promise up to 10x increase in productivity
through visual tools that eliminate the need to write code
such as Hadoop MapReduce Java programs, Pig scripts, Hive queries,
or NoSQL database queries and scripts. Making it easier for novices
to use Big Data platforms can only be a good thing after

Matt Casters, who is the Founder and Chief Architect of the
Pentaho Kettle Project spoke about the decision to open source
under an Apache license:

In order to obtain broader market
adoption of big data technology including Hadoop and NoSQL, Pentaho
is open sourcing its data integration product under the free Apache
license. This will foster success and productivity for developers,
analysts and data scientists giving them one tool for data
integration and access to discovery and

The linkup with Hadoop specialist Cloudera seems like a natural
fit too, as Ed Albanese, Head of Business Development at
Cloudera notes:

The Pentaho and Cloudera partnership allows our joint
customers to more quickly integrate Hadoop within their enterprise
data environments while also providing exceptional analytical
capabilities to a wider set of business users. We applaud Pentaho’s
decision to open source its big data capabilities under the Apache
License; the technology they are contributing is substantial and is
a big step forward in helping to accelerate adoption and make it
easier to use Hadoop for data transformation.”

It was only a matter of time before Pentaho unleashed
Kettle into the wild, and with Big Data set to be the rising star
of the year, they’ve timed this move to perfection. We look forward
to seeing Kettle’s powerful business analysis tools utilised in an
array of Big Data products.

You can download Kettle now and if you’re a beginner to
the Big Data field, this video shows you to create a MapReduce job
using Pentaho Kettle.

comments powered by Disqus