A Framework for Data-Intensive Workflow Execution on Multiple Execution Sites

(in progress)

Author: Orachun Udokasemsub

Computer Engineering Department, King Mongkut's University of Technology Thonburi (KMUTT), Thailand

Advisors:

1. Dr. Tiranee Achalakul

Computer Engineering Department, King Mongkut's University of Technology Thonburi (KMUTT), Thailand

2. Dr.Xiaorong Li

Institute of High Performance Computing (IHPC), Singapore

Abstract

Cloud computing is an emerging technology that combines large amount of computer resources into a virtual place so as to provide the on-demand computing facility to users. Scientific simulations are the applications that consist of a huge number of the complex computational tasks described by a workflow using directed acyclic graphs (DAG). This workflow can be submitted to a cloud system for execution with a large amount of computing resources. In order to optimize the performance of the

resource provisioning process on cloud, a workflow scheduling algorithm is needed. In addition, the rapid growth of global data is currently noticed. Thus, many data management challenges of scientific workflow executions must be taken into account.

In this research, the framework based on Artificial Bee Colony algorithm to execute data-intensive scientific workflow applications is proposed. The framework includes partitioning algorithm, scheduling algorithm, and file management techniques. The

execution will be run on multiple execution sites by partitioning the workflow into each execution site. Moreover, data transferring across execution sites will be managed by a technique in the proposed framework in order to reduce additional time caused by data

transferring. Finally, two experiments were designed to be a measure of the performance gained by the proposed framework. It is expected that the proposed framework will execute a submitted workflow with better makespan.

Dependencies

Java SE
Mongo DB
DIFSYS : https://github.com/orachun/difsys

Details

The framework consists of two servers: Site Manager and Worker. Site Manager manage the executing site by scheduling submitted workflows and dispatch into its workers, which can be the other site managers or workers. The worker receives tasks dispatched from its site manager, schedules these tasks, and executes on local processor. The intermediate files of the submitted workflows are transferred by the designed peer-to-peer method in order to utilize all available network bandwidth within the execution site.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Library		Library
build_		build_
dist_		dist_
nbproject		nbproject
nbproject_		nbproject_
src		src
.gitignore		.gitignore
README.md		README.md
build.xml_		build.xml_
db.sql		db.sql
default.properties		default.properties
manifest.mf		manifest.mf
nbbuild.xml		nbbuild.xml
ports.txt		ports.txt
submit.properties		submit.properties
task-manager.log		task-manager.log
truncate_db.sql		truncate_db.sql

DANLAMIGABI/diwe

Folders and files

Latest commit

History

Repository files navigation

A Framework for Data-Intensive Workflow Execution on Multiple Execution Sites

Abstract

Dependencies

Details

About

Resources

Stars

Watchers

Forks

Languages