Downloads: 3 | Views: 97 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Research Paper | Computer Technology | India | Volume 12 Issue 5, May 2023
Streamlining Enterprise Data Pipelines with an Automated DAG Factory for Airflow Orchestration in Cloud Environments using YAML Templates and JSON - Serialized Variables
Ramamurthy Valavandan  | Balakrishnan Gothandapani | Savitha Ramamurthy
Abstract: Airflow is an open - source platform for creating, scheduling, and monitoring data pipelines. Its Directed Acyclic Graph (DAG) factory provides a mechanism for creating and managing DAGs in a programmatic way. However, the current implementation of the DAG factory in Airflow requires writing Python code, which can be time - consuming and error - prone. In this research paper, we propose a YAML - based DAG factory automation framework for Airflow, which provides a simple and intuitive way to define DAGs in YAML format. We describe the design and implementation of the framework and provide examples of how it can be used to automate the creation and management of DAGs in a cloud environment. We also evaluate the performance and scalability of the framework using real - world datasets and compare it to the existing Python - based DAG factory in Airflow. Our results demonstrate that the YAML - based DAG factory automation framework provides a more efficient and flexible way to create and manage DAGs in Airflow, especially in large - scale data processing scenarios.
Keywords: Airflow, Directed Acyclic Graph, DAG factory, YAML, automation, Python, CLI tool, schema file, GCP, Composer, JSON, dictionary, task status, DAG tasks, template generation, variable
Edition: Volume 12 Issue 5, May 2023,
Pages: 656 - 673