Powered by an army of twits

Clockwork Raven uses humans to crunch your Big Data

Elliot Bentley
clockwork

More Twitter news today, as the company open-source their batch-upload tool for Amazon Mechanical Turk.

Number crunching with Big Data techniques is great, but
sometimes a computer algorithm just can’t cut it. What if you want
to know whether a tweet is referring to cats or cats or cats? Or maybe you
need to filter uploaded images for pictures of dogs in coats. If
so, most computers still aren’t smart enough to do the job.

Amazon’s ‘Human Intelligence Task’ marketplace, Mechanical Turk, is
perfectly suited for outsourcing this kind of work, allowing any
task, from transcribing text to classifying adult images, to an
army of workers. But when you have thousands of individual tasks,
inputting each can take some time.

Enter Clockwork
Raven
, Twitter’s own solution to batch-uploading to Mechanical
Turk. Open-sourced today – and largely overshadowed by
API news
– the application is now available for anyone to
download from
github.

The Ruby-based application is designed for those with almost zero
technical knowledge. Questions for your human analysts are
constructed using Google Docs-style forms, and the answers are
delivered back for detailed breakdown. Being an internal Twitter
tool, there’s easy embedding of tweets and Twitter users.

Of course, each analysis will cost a few cents – those human
workers need remuneration, after all – so it won’t ever be as
cost-effective as an algorithm. But it’s a unique and rather clever
option in the Big Data toolset.

Author
Comments
comments powered by Disqus