Many scientific applications, in particular in the geoscience domain, are built by interconnecting several functionally-independent components within a common data- and workflow scenario. Such applications have to deal with the issues of the coordinated execution of the components as well as organization of the data exchange between them (see Figure 1). Those issues of workflow-based execution are even worse considering that the components used in ChEESE have external data dependencies and are deployed on a distributed, heterogeneous infrastructure.
Figure 1. Simple workflow deployment on distributed infrastructure with external data dependencies.
For the automation of the applications execution with distributed, far-reaching paths between the data-interconnected software components, special workflow management Systems (WMS) are necessary. WMS are simple in terms of usage but at the same time powerful in terms of features functionality for dealing with the issues of management complex workflow-based applications. The wide-spread WMS either rely on an excessive middleware stack requiring special infrastructure permissions (such as in the case of Pegasus, Copernicus, etc.) or are too problem-oriented (such as in Makeflow, Parsl, FireWorks, etc.) which makes their integration with production infrastructures (like HPC) and use for a general applications design very complicated. Therefore, an effort was made in the ChEESE project to develop a complementary WMS that would allow the workflow design and development with the minimum of implementation efforts.
The WMS system developed in ChEESE, called WMS-light (see Figure 2), strives to provide a flexible framework of light-weight and portable software tools and services that can simplify the development and execution of workflow-based applications while possibly introducing the lowest overhead to the infrastructure resources' system software stack. This approach is achieved by leveraging a high level of decentralization of the major workflow management components, their service-orientation and a unified data management platform.
The WMS-light main features are:
- intuitive design of workflow scenarios by the users via its workflow specification API.
- backtracking of properties of all past executions provided by WMS-light’s persistent data layer.
- basic analytics functionalities by means of a rich querying system.
- easy adaptability of WMS-light applications by reusing components of already existing WMS-light applications.
- automatic data placement for application components that serves locality of the input and output data. This allows to transform existing scripts into WMS-light workflows with a minimally invasive effort and easy coupling of already existing workflow components.
- basic executable unit of WMS-light can be any binary application or script
- extendibility of application components with additional monitoring metrics (along with the automatically gathered hardware data) by a API1.
- open architecture and modular design for straightforward extension of the basic WMS functionality or even integration with the other WMS.
Figure 2. Architecture of ChEESE WMS-light workflow management system.
The prototype release is fully open source and can be found for download at the link: https://fs.hlrs.de/projects/cheese/wms-light_v01.tgz
For questions about the usage of WMS-light system please contact Christoph Niethammer (email@example.com) or Alexey Cheptsov (firstname.lastname@example.org).