MLflow: The open source ML platform for everyone
Can’t keep your ML models straight? The new open source platform MLflow has you covered for the entire machine learning life cycle with their Tracking APIs, Projects, and Models.
Getting into machine learning might be more profitable than ever, but with this technology’s rise in profile, the number of tools available online has proliferated accordingly. What’s a developer to do when faced with too many tool choices with poor tracking, as well as issues with reproducibility and deployment? MLflow is here to help.
Brought to you by the Databricks team, MLflow is a new open source platform for machine learning. Instead of being tied to a single enterprise’s internal ML platform, developers can easily leverage new ML libraries with a wider community.
MLflow’s main philosophy is openness: an open source open interface.
As an open source project, MLflow is welcoming contributions from the wider world. ML libraries don’t have to languish in company-specific silos. In particular, the open format allows developers to share everything from workflow steps to models across organizations if they want.
This openness is supported by MLflow’s open interface, which is designed to work with any preexisting ML library, algorithm, deployment tool, or language. Since it’s built around REST APIs and simple data formats, developers can easily add it to their existing ML code, making it possible to share code across ML libraries.
With MLflow’s Tracking API, developers can track parameters, metrics, and artifacts, Tis makes it easier to keep track of various things and visualize them later on. MLflow Tracking can be used in any environment from a standalone script to a notebook. The log results can be saved and compared between multiple runs or multiple users with the web UI.
MLflow Projects gives developers a standardized format to package reusable code for data science. Keeping it simple, each project has a directory with code or a Git repo. A descriptor file specifies the dependencies and how the code needs to be run. MLflow automatically sets up the right environment for the project.
If developers use Projects and Tracking in conjunction, MLflow automatically remembers the project version executed along with any parameters. That means developers can easily rerun the exact same code for improved reproducibility, extensibility, and experimentation.
Additionally, MLflow offers a convention for packaging ML models in multiple formats called “flavors”. The MLflow models come with a number of tools to deploy different flavors of your ML models, making it easy to deploy across diverse platforms. Again, if used in conjunction with the Tracking API, developers can keep track of which project individual flavors of a particular model came from.
Getting started with MLflow
Fair warning: MLflow is still in its alpha phase, which means many features are a work in progress. The Databricks team plans on introducing new components, library integrations, and future extensions like more support for environment types. So, do be aware that the situation is fluid and things will be changing.