Custom algorithms are in

Native code in Hadoop via MapReduce for C

Natali Vlatko
Numbers image via Shutterstock

Google have released an open source MapReduce framework for C, called MR4C, that allows developers to run native code in the Hadoop framework. Added contributions to the project are welcomed from the community.

If Hadoop is something you’d like to get more involved in, Google’s recent announcement about the open source MapReduce framework for C, called MR4C, might be just the thing to get you into gear.

C/C++ injection

Originally developed for large scale satellite image processing and geospatial data science by Skybox Imaging, MR4C is described by Ty Kennedy-Bowdoin from the Skybox team as “developed around a few simple concepts that facilitate moving your native code to Hadoop”.

MR4C lets you run your native C and C++ code on Hadoop without needing to write your own special libraries. As Kennedy-Bowdoin explains, the team believed the capabilities of Hadoop to be a good fit for scalable data handling, but they also wanted to leverage the powerful ecosystem of proven image processing libraries developed in C and C++.

The goal of the project is to condense the details of the MapReduce framework and allow users to focus on developing worthwhile algorithms, with feedback welcome on the project’s headway via their Google Group.

The framework functions by storing algorithms in native shared objects that access data from the local filesystem or any URI, while input/output datasets, runtime parameters and external libraries are configured using JSON files. Splitting mappers and allocating resources can be configured with Hadoop YARN based tools or at the cluster level for MapReduce version 1.

Workflows of multiple algorithms can be strung together using an automatically generated configuration, with callbacks in place for logging and progress reporting – these are viewed using the Hadoop JobTracker interface. Workflow can be built and tested on a local machine using the same interface employed on the target cluster.

To get started, MR4C documentation and source code is available via their GitHub page.

Natali Vlatko
An Australian who calls Berlin home, via a two year love affair with Singapore. Natali was an Editorial Assistant for (S&S Media Group).

Inline Feedbacks
View all comments