Analyze this!

One To Watch – Big Data platform, Precog

Chris Mayer
precog

We talked to the man behind the latest Big Data project set to simplify the entire process of getting analytics from your data.

Over the past few years, the data
landscape has radically changed, no thanks in part to the success
of Hadoop. The Apache project has become the gold standard for data
warehousing, and the long list of clients using it prove its
maturity and stability. 

With so many businesses preaching the power of Hadoop,
it’s quite easy to be drawn in by its star power, when you actually
might be searching for something quite different – be it a graph
database, a NoSQL datastore or otherwise. If you do opt for Hadoop,
you could end up in a world of trouble should you try make it
something it’s not.

Over the past year, we’ve seen plenty of ‘Big Data’ style
projects appear but never one that try to encompass so much
as
Precog, in simplifying
the entire process. Emerging last month into public beta to much
acclaim, the big data infrastructure platform aims to bridge the
gaps between the array of data assets available to companies out
there.

“I describe Precog as a data science platform that helps
companies leverage their data assets to build new data projects,
and data-driven features to existing products,” says Precog’s
founder John De Goes, who also acts as CEO and CTO. He continues:
“If you want to think of other technologies out there, Precog is a
kind of database, but it’s focused on a very specific use
case.”

That use case is measured data and the world of data science.
From his time as VP of Engineering at SocialMedia.com, a social
media advertising platform, De Goes grew tired of the “colossal
undertaking” of building analytical features reliant on low-level
open source tools, and decided to do something about it. Taking a
motley crew of developers with him to
Denver Startup
Weekend
, the seeds of Precog were sown. From
there, the company was accepted into the accelerator TechStars
program in May of last year, allowing them to pursue their idea
further.

“Precog is quite different [to other databases] because we
focus on storing and warehousing measured data,” says De Goes.
“This is often behavioural data, like people clicking and buying
stuff, so transactional data, historical data event-oriented data.
That’s the kind of data we focus on.”

“We don’t focus on giving you facilities to get and store
that data, we focus on giving you deep data science tools to
analyse that data at a very deep-level and do arbitrary analytics,
statistics and machine learning across that data
center.”

So essentially rather than the laborious process of
learning the ins and outs of Hadoop (of which there are many) or
non-relational databases like MongoDB, Precog acts as the
facilitator for businesses to glean important insights into their
data and then ‘productize’ their efforts.

A good example of Precog’s ability to mashup different
sources was shown this week, with their
‘Real-Time Twitter
Election Analysis’
dashboard. Alongside their
partner AlchemyAPI, Precog showed the power and potential of their
platform, by showing state-by-state sentiment analysis of
tweets.

“Honestly, it was dead simple to put together,” tells De
Goes. “[We] just plug-in data from Twitter into Precog, [do] our
Quirrel analysis in Labcoat, export that as code, slap it into a
HTML document and boom, we have real-time sentiment analysis for
the Twitter data.”

De Goes believes that not enough enterprises are seeing the
true capabilities of their data. “I think in this day and age,
companies are becoming increasingly comfortable with storing and
consolidating the masses of data they have,” he says.

“Right now, everyone has big data. Ok so what? The next
interesting step is figuring what to do with that big data. That’s
the really hard part. Anyone can buy a data warehouse or a massive
Hadoop cluster and start dumping data in those things. It’s how you
move from having massive amounts of data to actually making more
money, based on that data — it’s the next logical
step.”

From their private beta, Precog
seem to have thought through every avenue. Through their set of
JSON-supporting
REST APIs
(Accounts, Ingest, Metadata, Analytics and Security), users
can set to work creating their own solution, or add something onto
an existing service. There’s an embrace to the core programming
languages, with client libraries in .NET, Ruby, Python, PHP,
JavaScript and of course Java, which De Goes says accounts for 90%
of whom they want to target. Two other products play a huge part in
the Precog platform – the Labcoat IDE designed for the data
scientists among us and ReportGrid for visualising all the data at
your disposal.


In a recent Gigaom piece,
De Goes was quoted
as saying “Haddop is stupid”. Not the technology itself but the
mentality of some using it as something to solve everything as
“naive”.

“I obviously don’t think Hadoop the technology is stupid,”
explains De Goes. “It’s more how enterprises and large companies
have unfortunately used Hadoop as the panacea solution to all their
data problems.That kind of mentality is very stupid and the reason
for that is that in the world of Big Data, it’s all about
compromises. You’re gonna compromise something when you have the
ability to store terabytes or petabytes of data – you will be
compromising some things. The particular technology you choose
dictates exactly what you’re compromising.”

“There’s just a lot of misinformation and a lot of vendors
out there are trying to cram Hadoop down people’s throats, or a
rather their bandaid for Hadoop down people’s throats. My thinking
is use the right tool for the right job.”

It’s a simple concept really yet not one fully understood by
the ones with the cash in this world. Could Precog itself become
the driver for this change in thinking? If this week’s signs are to
go by, quite possibly. Just yesterday

the company announced a MongoDB implementation,

allowing users to run deeper analysis on top of their MongoDB
database without any compromises (custom code or extra
ETL).

As Precog heads towards a proper release, these tie-ins and
partnerships with the companies behind the data sources are pivotal
in its mission to break down the complexity of Big Data. By the
looks of it, they’re going the right way about it, and might just
lead the next generation of Big Data app development.

Author
Comments
comments powered by Disqus