The wild world of machine learning

Machine learning with Oryx: Run wild with real-time ML

Sarah Schlothauer
machine learning
© Shutterstock / CherylRamalho

The open source machine learning frameworks just keep on coming. Oryx 2 is focused on real-time, large scale machine learning and uses the power of 3 tiers. Grab it by the horns and create custom applications.

Add another machine learning framework to your radar. Meet Oryx 2: an open source framework created “for real-time large scale machine learning”.

Large scale machine learning

First off: what is Oryx? (No, not the animal. Not the novel either.)

From their website, Oryx is a “realization of the lambda architecture built on Apache Spark and Apache Kafka“. Apache Spark has the benefit of being incredibly fast in-memory, beating out Hadoop in the race. (Of course, Apache Spark and Hadoop can be used for different things, so your preference between the two may vary and depend on more than just speed alone. As you will see under the requirements, Oryx requires both.) Meanwhile, Apache Kafka is a distributed streaming platform that builds real-time streaming applications and data pipelines.

If it sounds familiar to you, it should! Oryx 2 is actually a sequel of its original project. Now updated it uses new architecture consisting of three tiers that can be implemented together or independently of one another.

The project’s GitHub invites programmers to “deploy a ready-made, end-to-end applications for collaborative filtering, classification, regression and clustering”.

SEE ALSO: Top 5 machine learning frameworks for Java and Python

Oryx’s main focus is real-time large scale machine learning. It is a workhorse of a framework. (A workoryx?)

Three-tiered cake

The three tier system is the bread and butter of Oryx and knowing how to use them is the key to making great apps.

From the documentation:

A generic lambda architecture tier, providing batch/speed/serving layers, which is not specific to machine learning

A specialization on top providing ML abstractions for hyperparameter selection, etc.

An end-to-end implementation of the same standard ML algorithms as an application (ALS, random decision forests, k-means) on top

It’s all about mixing and matching the layers. While you don’t have to use them all, they can work together. Again, let’s take it right from the Oryx’s mouth and learn more about each layer:

A Batch Layer, which computes a new “result” (think model, but, could be anything) as a function of all historical data, and the previous result. This may be a long-running operation which takes hours, and runs a few times a day for example.

A Speed Layer, which produces and publishes incremental model updates from a stream of new data. These updates are intended to happen on the order of seconds.

A Serving Layer, which receives models and updates and implements a synchronous API exposing query operations on the result.

A data transport layer, which moves data between layers and receives input from external sources

The Batch and Speed layers are implemented as Spark Streaming processes, so they each run on a Hadoop cluster. Meanwhile, the data transport layer is an Apache Kafka topic and the serving layer helps maintain the model state in memory.

GitHub provides a helpful architecture diagram that will help you master this system.

machine learning


Adding Oryx to your zoo

SEE ALSO: How well do you know your Apache Spark trivia?

Do you have any burning use cases for Oryx? Check out their page about making an app with the framework.

In order to build an app with Oryx version 2.7.0, you will need:

See the Javadoc for the nitty-gritty details.

Is this the big data framework for you? Give it a go and maybe you’ll be adding a new member to your machine learning zoo.

What kind of applications will you build with Oryx?

Sarah Schlothauer

Sarah Schlothauer

All Posts by Sarah Schlothauer

Sarah Schlothauer is the editor for She received her Bachelor's degree from Monmouth University, West Long Branch, New Jersey. She currently lives in Frankfurt, Germany with her husband and cat where she enjoys reading, writing, and medieval reenactment. She is also the editor for Conditio Humana, an online magazine about ethics, AI, and technology.

1 Comment
Inline Feedbacks
View all comments
Sean Owen
Sean Owen
3 years ago

Author here. Thanks for the write-up! And for knowing about the Margaret Atwood book that isn’t Handmaid’s Tale.