Data Logistics Service

The Data Logistics Service is responsible for data movements part of the workflows developed in the project.

The service is based on Apache Airflow. The project specific extensions and data pipelines formalizing the data movements can be found in the project repository.

From the user perspective, the most important part of the service are the definitions of data movements (pipelines). Some examples (e.g. minimal workflow) of those are provided in the repository. A good starting point for defining own pipelines is the original documentation. Please note that the pipelines are defined in Python programming language and can execute shell scripts. That means that if the users already have their own solution for data movements which are based on scripts or Python programs they can easily be moved to the Data Logistics Service to obtain a running environment with monitoring, retires upon failure, etc.

There is a testing instance of the data logistics service hosted in HDF could which can be accessed.