One To Watch – Big Data platform, Precog
We talked to the man behind the latest Big Data project set to simplify the entire process of getting analytics from your data.
Over the past few years, the data landscape has radically changed, no thanks in part to the success of Hadoop. The Apache project has become the gold standard for data warehousing, and the long list of clients using it prove its maturity and stability.
With so many businesses preaching the power of Hadoop,
it’s quite easy to be drawn in by its star power, when you actually
might be searching for something quite different – be it a graph
database, a NoSQL datastore or otherwise. If you do opt for Hadoop,
you could end up in a world of trouble should you try make it
something it’s not.
Over the past year, we’ve seen plenty of ‘Big Data’ style projects appear but never one that try to encompass so much as Precog, in simplifying the entire process. Emerging last month into public beta to much acclaim, the big data infrastructure platform aims to bridge the gaps between the array of data assets available to companies out there.
“I describe Precog as a data science platform that helps companies leverage their data assets to build new data projects, and data-driven features to existing products,” says Precog’s founder John De Goes, who also acts as CEO and CTO. He continues: “If you want to think of other technologies out there, Precog is a kind of database, but it’s focused on a very specific use case.”
That use case is measured data and the world of data science. From his time as VP of Engineering at SocialMedia.com, a social media advertising platform, De Goes grew tired of the “colossal undertaking” of building analytical features reliant on low-level open source tools, and decided to do something about it. Taking a motley crew of developers with him to Denver Startup Weekend, the seeds of Precog were sown. From there, the company was accepted into the accelerator TechStars program in May of last year, allowing them to pursue their idea further.
“Precog is quite different [to other databases] because we focus on storing and warehousing measured data,” says De Goes. “This is often behavioural data, like people clicking and buying stuff, so transactional data, historical data event-oriented data. That’s the kind of data we focus on.”
“We don’t focus on giving you facilities to get and store that data, we focus on giving you deep data science tools to analyse that data at a very deep-level and do arbitrary analytics, statistics and machine learning across that data center.”
So essentially rather than the laborious process of
learning the ins and outs of Hadoop (of which there are many) or
non-relational databases like MongoDB, Precog acts as the
facilitator for businesses to glean important insights into their
data and then ‘productize’ their efforts.
A good example of Precog’s ability to mashup different sources was shown this week, with their ‘Real-Time Twitter Election Analysis’ dashboard. Alongside their partner AlchemyAPI, Precog showed the power and potential of their platform, by showing state-by-state sentiment analysis of tweets.
“Honestly, it was dead simple to put together,” tells De Goes. “[We] just plug-in data from Twitter into Precog, [do] our Quirrel analysis in Labcoat, export that as code, slap it into a HTML document and boom, we have real-time sentiment analysis for the Twitter data.”
De Goes believes that not enough enterprises are seeing the true capabilities of their data. “I think in this day and age, companies are becoming increasingly comfortable with storing and consolidating the masses of data they have,” he says.
“Right now, everyone has big data. Ok so what? The next interesting step is figuring what to do with that big data. That’s the really hard part. Anyone can buy a data warehouse or a massive Hadoop cluster and start dumping data in those things. It’s how you move from having massive amounts of data to actually making more money, based on that data — it’s the next logical step.”
From their private beta, Precog
seem to have thought through every avenue. Through their set of
JSON-supporting REST APIs
(Accounts, Ingest, Metadata, Analytics and Security), users
can set to work creating their own solution, or add something onto
an existing service. There’s an embrace to the core programming
languages, with client libraries in .NET, Ruby, Python, PHP,
of whom they want to target. Two other products play a huge part in
the Precog platform – the Labcoat IDE designed for the data
scientists among us and ReportGrid for visualising all the data at
In a recent Gigaom piece, De Goes was quoted as saying “Haddop is stupid”. Not the technology itself but the mentality of some using it as something to solve everything as “naive”.
“I obviously don’t think Hadoop the technology is stupid,” explains De Goes. “It’s more how enterprises and large companies have unfortunately used Hadoop as the panacea solution to all their data problems.That kind of mentality is very stupid and the reason for that is that in the world of Big Data, it’s all about compromises. You’re gonna compromise something when you have the ability to store terabytes or petabytes of data – you will be compromising some things. The particular technology you choose dictates exactly what you’re compromising.”
“There’s just a lot of misinformation and a lot of vendors out there are trying to cram Hadoop down people’s throats, or a rather their bandaid for Hadoop down people’s throats. My thinking is use the right tool for the right job.”
It’s a simple concept really yet not one fully understood by the ones with the cash in this world. Could Precog itself become the driver for this change in thinking? If this week’s signs are to go by, quite possibly. Just yesterday the company announced a MongoDB implementation, allowing users to run deeper analysis on top of their MongoDB database without any compromises (custom code or extra ETL).
As Precog heads towards a proper release, these tie-ins and partnerships with the companies behind the data sources are pivotal in its mission to break down the complexity of Big Data. By the looks of it, they’re going the right way about it, and might just lead the next generation of Big Data app development.