In this research, We introduce two probabilistic mechanisms to
certificate parallel applications on distribute architecture supposing
that there are no oracles on which we depend on certification, in
addition to introducing cost model of two mecha
nisms and compare
them.
In this research, we are interested in parallel applications, which
are represented by data-flow graph that is built dynamically during
the execution and which are executed in a wide distributed
heterogeneous and dynamic environment and these applications
use the principle of work stealing to distribute the tasks among the
processors.
The study is researching the fault tolerance in the large distributed
environments such as grid computing and clusters of computers in
order to find the most effective ways to deal with the errors
associated with the crash one of the devices in th
e environment or
network disconnection to ensure the continuity of the application in
the presence of the faults.In this paper we study a model of the
distributed environment and the parallel applications within it. Then
we provide a checkpoint mechanism that will enable us to ensure
continuity of the work used by a virtual representation of the
application (macro dataflow) and suitable for the applications
which uses work stealing algorithm to distribute the tasks which
are implemented in heterogeneous and dynamic environment.
This mechanism will add a simple cost to the cost of parallel
execution as a result of keeping part of the work during fault-free
execution. The study also provides a mathematical model to
calculate the time complexity i.e. the cost of this proposed
mechanism.
We introduce an auto adaptive strategy enables to write a parallel
algorithm adapts to the number of available resources at allocated
parallel environment to execute the parallel program. The parallel
applications we are studying which are represe
nted by data-flow
graph which built dynamically during the execution. The new
suggested strategy is based on coupling of a sequential algorithm
and a parallel one and relies on the principle of work stealing in
the tasks scheduling. We offer a study of the complexity of the
adaptive algorithm and analyze its performance on processors
and compare it with a performance of a classic parallel algorithm.
The work aims to make benefit from existence
multi-CPU and multi-GPU, exploiting the calculation processes
which do multi-GPU, which aims to form mechanism to
scheduling a directed acyclic graph(DAG), it aims to reduce
communication between resources and inter linked task
scheduling in the best form.