Concurrent introduce Hadoop machine learning tool Pattern
Following March’s $4m investment, Cascading company Concurrent have made their latest Big Data move, pushing a new free Hadoop workflow project out into the open.
One of Hadoop’s main enterprise pain points is the difficulty in integrating with analytics systems. Without comprehensive links to established analytics systems such as R or SAS, Hadoop will struggle to be accepted as the data processing gold standard without simplified methods of gleaning insights.
Scoring engine Pattern allows users to quickly deploy machine learning models in Hadoop clusters, making it far easier, in theory, to perform statistical analysis on applications. Data scientists can either export models over through the Pattern Java API or the popular XML-based Predictive Model Markup Language.
It is the second project Concurrent has launched to complement Cascading . Back in February, the company launched Lingual, a tool designed to help SQL get up to speed with Hadoop. Over the last few months, many Hadoop vendors have been thinking along the same lines. Application framework Cascading however has been around since is already in the production environments of Twitter, eBay and Etsy.
CTO Chris Wensel admitted to GigaOM that Pattern alone “isn’t the real takeaway,” but the three projects in unison is the real proposition.
“When combined, Cascading, Lingual and Pattern close the modeling, development and production loop for all data oriented applications. The combination of the three is the application ensemble for further enabling enterprises to drive differentiation through data,” he explained in a press release.
Pattern’s nearest competitor is Apache Mahout, a fellow scalable machine learning library which arrived in 2009. However Concurrent are keen to point out its differences, believing that Apache Mahout is merely a set of HDFS-focused algorithms while Pattern “can leverage resources beyond Hadoop while complying best practice for Enterprise IT”.
Wensel points out that while all will remain open source, the company plan to create “a suite” of products that centre on Cascading, as promised in March. Concurrent’s best chance of generating noise and cash lie with the application framework as the main layer to deploy and create Hadoop applications, with Lingual and Pattern dovetailing it as enterprise sweeteners.
Image courtesy of Frédéric BISSON