S4 Distributed Stream Platform

Yahoo! Open Source S4 Computing Platform

Jessica Thornsby

Platform originally developed for personalising search ads, gets open sourced.

Yahoo! have open sourced the S4 distributed stream computing platform for developing applications for processing continuous, unbounded streams of data. S4 was originally developed for personalising search advertising products at Yahoo! where S4 was used in processing recent queries, clicks and timing information.

S4 routes keyed data events with affinity to Processing Elements, which consume the events and either emits events which may be consumed by other Processing Elements, or publishes the results. The nodes are symmetric with no centralised service and no single point of failure, and a cluster management layer based on ZooKeeper re-routes events to other servers automatically. The S4 team are currently encouraging those interested in stream processing to get involved in the project.