Pythia: Facebook’s deep learning framework for the vision and language domain
The latest open sourced tool from Facebook AI Research is Pythia, a deep learning framework designed to help with Visual Question Answering. It is built on top of the PyTorch framework and offers a modular design for building AI models. Take a peek at the research involved.
Can you teach AI how to read? Facebook AI Research has yet another open source deep learning offering. Built upon PyTorch, Pythia is a modular framework for deep learning.
It was designed for help with Visual Question Answering (VQA). This means that the AI “reads” a photo and answers questions based on the visual data available. This research can be used, for instance, to automate image captioning by reading text from a photograph.
Reading with deep learning
From the Facebook AI Research announcement regarding open sourcing Pythia:
Features include reference implementations to show how previous state-of-the-art models achieved related benchmark results and to quickly gauge the performance of new models. In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers…Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation. Our goal is to accelerate progress by increasing the reproducibility of these models and results. This will make it easier for the community to build on, and benchmark against, successful systems.
Interested in the research on this topic of Visual Question Answering?
Read the relevant paper Towards VQA Models That Can Read from Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. You can also explore their research to see Pythia-powered examples of how deep learning “reads” text.
From the documentation, Pythia’s features include:
- Model Zoo: Reference implementations for VQA using LoRRA, the Pythia model, and BAN.
- Distributed: Supports DataParallel and DistributedDataParallel
- Multi-tasking: Save time and train multiple datasets simultaneously
- Customizable: Customize losses, metrics, scheduling, optimizers, and tensorboard.
- Unopinionated: Unopinionated dataset and model implementations
- Modules: Implementations for commonly used layers in vision and language domain
Pythia can also be used as a starter codebase or used to bootstrap a VQA project.
Vision & language
Check out the full documentation here for a quickstart guide and available libraries.
A demo of the Pythia model is also on Colab. (You will need to install the necessary data before heading into the playground.)