Publications des agents du Cirad

Cirad

Distributed caching of scientific workflows in multisite cloud

Heidsieck G., De Oliveira D., Pacitti E., Pradal C., Tardieu F., Valduriez P.. 2020. In : Hartmann Sven (ed.), Küng Josef (ed.), Kotsis Gabriele (ed.), Tjoa A. Min (ed.), Khalil Ismail (ed.). Database and expert systems applications. DEXA 2020 (Part II). Cham : Springer, p. 51-65. (Lecture Notes in Computer Science, 12392). International Conference on Database and Expert Systems Application (DEXA 2020). 31, 2020-09-14/2020-09-17, Bratislava (Slovaquie).

DOI: 10.1007/978-3-030-59051-2_4

Many scientific experiments are performed using scientific workflows, which are becoming more and more data-intensive. We consider the efficient execution of such workflows in the cloud, leveraging the heterogeneous resources available at multiple cloud sites (geo-distributed data centers). Since it is common for workflow users to reuse code or data from other workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. In this paper, we propose a solution for distributed caching of scientific workflows in a multisite cloud. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation on a three-site cloud with a data-intensive application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of same input data for each new execution.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :