Scientific workflow scheduling in distributed systems using structure learning

Downloads

Downloads per month over past year

Samandi, Vahab ORCID: https://orcid.org/0000-0002-5774-1550 (2024). Scientific workflow scheduling in distributed systems using structure learning. University of Birmingham. Ph.D.

[img] Samandi2024PhD.pdf
Text - Accepted Version
Restricted to Repository staff only
Available under License All rights reserved.

Download (2MB)

Abstract

This research investigates workflow scheduling in distributed resources (e.g., Cloud Computing). We specifically concentrated on addressing various workflow scheduling challenges, including dealing with complex workflow structures, minimising the time complexity of algorithms, optimising the execution time of workflows (makespan), and accurately estimating performance. First, we proposed a novel Bottom-Up Top-Down Recursive Neural Network (BUTD RecNN) model, a structure learning algorithm relevant to managing complex workflow structures. We then developed an innovative task duplication scheduling algorithm that uses the proposed structure learning model (RecNN). Task duplication scheduling algorithms are designed to reduce the high communication costs associated with data-dependent tasks. The BUTD RecNN model can learn from historical duplication decisions on workflows (represented as DAGs) to efficiently produce duplication recommendations for new unseen workflows. This approach is tested on collections of Montage workflows. The second focus of this research is scheduling in dynamic environments where scheduling information is either incomplete or partially available. We explored whether incorporating the workflow structure specifically can lead to more accurate estimations of task execution requirements. By leveraging the recursive nature of the network, the model effectively captures the hierarchical dependencies and relationships within complex workflows, leading to more precise and efficient task scheduling predictions. We compare the estimation accuracy of a graph learning neural network, Recursive Neural Network (RecNN), with two standard prediction models (that do not consider the workflow structure), a linear and non-linear regression. We trained the prediction models by utilising two scientific workflows: Montage and LIGO. The execution time (makespan) comparison of the newly generated workflows with the original set of workflows shows that the RecNN model estimates the task information more accurately than linear and non-linear regression models, and the makespan of the workflow generated by the estimated values by RecNN is closer to the makespan of the original workflows. The result shows that explicitly considering the workflow structure through structure learning models can considerably improve workflow scheduling in distributed systems such as Cloud. We then proposed a framework to integrate structure learning into existing workflow management systems. This framework aims to enhance the capabilities of these systems by incorporating advanced structure learning techniques, which can improve the efficiency and accuracy of task scheduling and execution.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Tino, PeterUNSPECIFIEDUNSPECIFIED
Bahsoon, RamiUNSPECIFIEDUNSPECIFIED
Licence: All rights reserved
College/Faculty: Colleges > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: None/not applicable
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
URI: http://etheses.bham.ac.uk/id/eprint/15507

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year