Clockwork Raven uses humans to crunch your Big Data

Elliot Bentley

More Twitter news today, as the company open-source their batch-upload tool for Amazon Mechanical Turk.

Number crunching with Big Data techniques is great, but sometimes a computer algorithm just can’t cut it. What if you want to know whether a tweet is referring to cats or cats or cats? Or maybe you need to filter uploaded images for pictures of dogs in coats. If so, most computers still aren’t smart enough to do the job.

Amazon’s ‘Human Intelligence Task’ marketplace, Mechanical Turk, is perfectly suited for outsourcing this kind of work, allowing any task, from transcribing text to classifying adult images, to an army of workers. But when you have thousands of individual tasks, inputting each can take some time.

Enter Clockwork Raven, Twitter’s own solution to batch-uploading to Mechanical Turk. Open-sourced today – and largely overshadowed by API news – the application is now available for anyone to download from github.

The Ruby-based application is designed for those with almost zero technical knowledge. Questions for your human analysts are constructed using Google Docs-style forms, and the answers are delivered back for detailed breakdown. Being an internal Twitter tool, there’s easy embedding of tweets and Twitter users.

Of course, each analysis will cost a few cents – those human workers need remuneration, after all – so it won’t ever be as cost-effective as an algorithm. But it’s a unique and rather clever option in the Big Data toolset.

