Maintain your workflow with Apache Airflow
Apache Airflow has officially been granted Top-Level Project status by The Apache Software Foundation! Over 200 companies manage their workflows with Airflow, so what makes it so usable? Have a beginner’s look at what you can gain from giving Airflow a try.
Apache Airflow has been making workflows lighter than air, and now it soars a little higher. Now, it moves on up in the world. It is the latest project to graduate to Top-Level Project status in The Apache Software Foundation.
Top-level projects have proven themselves as mature and meet set standards. Originally the brain child of Maxime Beauchemin at Airbnb, Airflow is now an open source project and joined the Apache Software Foundation in 2016.
In the announcement, Vice President of Apache Airflow Bolke de Bruin stated: “Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration.”
Let’s take a quick look at why Apache Airflow is used by over 200 organizations, including big names such as Paypal, Reddit, Square, Etsy, Quora, and Groupon.
Features & usage
From its documentation, Apache Airflow states:
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
For teams, this tool flexes in a big way, including visualizing pipelines and troubleshooting.
First, one of the major benefits are the Directed Acyclic Graphs (DAGs). From the project’s documentation, they are a “collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies”. From there, you can view all of your DAGs in tree view and graph view. The usage of rich user interfaces makes Airflow easy to understand and use across your team.
- Modular architecture
- Uses the Jinja template engine for Python
- Error handling
- Database and dependency management
- Smart scheduling
Why people love Airflow
So, what do the tech giants have to say about this project?
This article by Tao Feng, Andrew Stahlman, and Junda Yang explains how the ride-sharing app Lyft utilizes Airflow for monitoring and alerting. They call Apache Airflow “reliable, efficient, and trustworthy”. (You certainly need all those qualities in order to keep all those passengers and drivers in order!)
PayPal also relies on Airflow to keep things in check. Chief Data Engineer at Paypal, Sid Anand said:
With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably. “Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning.
Starting up is easy and there’s no shortage of tutorials to help guide users.
Install Airflow here to begin. Extra packages are available if needed.
Looking for help? Here are a few helpful resources: