Full metadata
Title
Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows
Description
Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows.
Date Created
2017
Contributors
- de Armas, Jadiel (Author)
- Bazzi, Rida (Thesis advisor)
- Huang, Dijiang (Committee member)
- Syrotiuk, Violet (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
78 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.44289
Level of coding
minimal
Note
Masters Thesis Computer Science 2017
System Created
- 2017-06-01 02:06:42
System Modified
- 2021-08-26 09:47:01
- 3 years 3 months ago
Additional Formats