Meet Manifold: Uber’s machine learning model debugging tool goes open source
Manifold, Uber’s model-agnostic visual debugging tool for machine learning, is now open source and available as a demo version and a GitHub repository. Manifold is built with TensorFlow.js, React, and Redux and is part of the Michelangelo machine learning platform. The open source version includes a few new features that will make for an easier user experience.
Ride-sharing giant Uber open sourced their visual debugging tool for machine learning. Manifold is a model-agnostic tool that helps identify issues in machine learning models. After highlighting the Manifold project, community feedback alerted Uber about its general-purpose potentials as a standalone tool.
Manifold is used internally by Uber in a number of use cases, from evaluating the arrival time of Uber Eats deliveries, determining efficient routing, and identifying potentially unsafe trips.
The repository is now available on GitHub. According to its README, it is currently stable and being incubated for long-term release.
Under the hood, Manifold is built with TensorFlow.js, React, and Redux. It is part of the Michelangelo machine learning platform and helps the team at Uber diagnose problems and find the cause of performance issues.
From its GitHub README:
As a visual analytics tool, Manifold allows ML practitioners to look beyond overall summary metrics to detect which subset of data a model is inaccurately predicting. Manifold also explains the potential cause of poor model performance by surfacing the feature distribution difference between better and worse-performing subsets of data.
The open source version includes a few new features that will make for an easier user experience. New features include:
- Visualization support: View geo-spatial features for each data slice and find correlation points between geo-location data points.
- Jupyter Notebook integration: Input data as Pandas DataFrame objects and render a visualization with Jupyter.
- Interactive data slicing and comparisons: Slice data and compare query data based on different factors such as prediction loss.
Manifold includes both a performance comparison view and a feature attribution view.
The performance comparison view gives you an overview of a model’s performance across different data segments. Meanwhile, the feature attribution view shows feature distributions of data subsets.
In a blog post highlighting Manifold, software engineers Lezhi Li and Yang Wang writes that “Manifold empowers data scientists to discover insights that guide them through the model iteration process”. By using data analytics, they can identify useful features in machine learning models and eliminate false negatives in model results.
Manifold’s three primary benefits include: model-agnosticism, visual analytics for model performance evaluation that look beyond model performance summary statistics for inaccuracies, and the ability to separate visual analytics system and standard model training computations to facilitate faster and more flexible model development.
Lezhi Li and Yang Wang
Read more about its visualization design and algorithm on the Uber Engineering blog.
Joining open source
View the demo version here and begin loading your data files. Or, try a sample data set first. The open source version includes an npm package version and a Python package version.
Manifold is just one of the several internal software solutions created by Uber that they have released to the public as FOSS. Other open source projects include a Python package for unit testing, a mock app generator, and a peer-to-peer Docker registry.
View all of their projects.