Based on Python
Python is used to describe ETL/ELT processes. And anyone with knowledge of Python will find using Airflow easy.
A small but full-fledged toolkit
Great for creating and managing data processing processes. Working with AirFlow is possible using CLI, REST API and a web interface created on the basis of the Flask Python framework.
Integration
AirFlow supports many databases (MySQL, PostgreSQL, DynamoDB, Hive), big data storage (HDFS, Amazon S3), and cloud platforms (Google Cloud platform, Amazon Web Services, Microsoft Azure).
An extensible REST API
Makes it relatively easy to integrate Airflow into an existing enterprise IT landscape and flexibly customize data pipelines.
Monitoring and alerting
Integration with Statsd and FluentD is supported for collecting and sending metrics and logs.
Role-based access
AirFlow provides 5 roles with different access levels: Admin, Public, Viewer, Op, User. Integration with Active Directory and flexible access configuration using RBAC are possible.
Testing support
It’s possible to use basic unit tests to test pipelines and specific tasks in them.
Scalability
Airflow is scalable due to its modular architecture and message queue for an unlimited number of DAGs
Open source
AirFlow is actively maintained by the community and has well documented documentation.
Luigi
What is Luigi?
Luigi is a Python framework for building complex sequences of dependent tasks. A fairly large part of the framework is aimed at transforming data from various sources (MySQL, MongoDB, Redis) and using various tools (from starting a process to executing tasks of various types on a Hadoop cluster).
Learn moreDagster
What is Dagster?
Dagster is an orchestrator that’s designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
Learn more