ترغب بنشر مسار تعليمي؟ اضغط هنا

An Improved Multiple Faults Reassignment based Recovery in Cluster Computing

63   0   0.0 ( 0 )
 نشر من قبل William Jackson
 تاريخ النشر 2011
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In case of multiple node failures performance becomes very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. Existing recovery schemes are efficient for single fault but not with multiple faults. Recovery scheme proposed in this paper having two phases; sequentially phase, concurrent phase. In sequentially phase, loads of all working nodes are uniformly and evenly distributed by proposed dynamic rank based and load distribution algorithm. In concurrent phase, loads of all failure nodes as well as new job arrival are assigned equally to all available nodes by just finding the least loaded node among the several nodes by failure nodes job allocation algorithm. Sequential and concurrent executions of algorithms improve the performance as well better resource utilization. Dynamic rank based algorithm for load redistribution works as a sequential restoration algorithm and reassignment algorithm for distribution of failure nodes to least loaded computing nodes works as a concurrent recovery reassignment algorithm. Since load is evenly and uniformly distributed among all available working nodes with less number of iterations, low iterative time and communication overheads hence performance is improved. Dynamic ranking algorithm is low overhead, high convergence algorithm for reassignment of tasks uniformly among all available nodes. Reassignments of failure nodes are done by a low overhead efficient failure job allocation algorithm. Test results to show effectiveness of the proposed scheme are presented.


قيم البحث

اقرأ أيضاً

The rapid technological advances in the Internet of Things (IoT) allows the blueprint of Smart Cities to become feasible by integrating heterogeneous cloud/fog/edge computing paradigms to collaboratively provide variant smart services in our cities a nd communities. Thanks to attractive features like fine granularity and loose coupling, the microservices architecture has been proposed to provide scalable and extensible services in large scale distributed IoT systems. Recent studies have evaluated and analyzed the performance interference between microservices based on scenarios on the cloud computing environment. However, they are not holistic for IoT applications given the restriction of the edge device like computation consumption and network capacity. This paper investigates multiple microservice deployment policies on the edge computing platform. The microservices are developed as docker containers, and comprehensive experimental results demonstrate the performance and interference of microservices running on benchmark scenarios.
This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$times$ and 7 0$times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. Performance optimizations such as the pack/unpack message method, removing temporary arrays as arguments to procedure calls, allocating global memory for limiters and connected boundary data, reordering non-blocking MPI I_send/I_recv and Wait calls, reducing unnecessary implicit derived type member data movement between the host and the device and the use of GPUDirect can improve the compute utilization, memory throughput, and asynchronous progression in the multi-block CFD code using modern programming features.
Mobile edge computing (MEC) has become a promising solution to utilize distributed computing resources for supporting computation-intensive vehicular applications in dynamic driving environments. To facilitate this paradigm, the onsite resource tradi ng serves as a critical enabler. However, dynamic communications and resource conditions could lead unpredictable trading latency, trading failure, and unfair pricing to the conventional resource trading process. To overcome these challenges, we introduce a novel futures-based resource trading approach in edge computing-enabled internet of vehicles (IoV), where a forward contract is used to facilitate resource trading related negotiations between an MEC server (seller) and a vehicle (buyer) in a given future term. Through estimating the historical statistics of future resource supply and network condition, we formulate the futures-based resource trading as the optimization problem aiming to maximize the sellers and the buyers expected utility, while applying risk evaluations to relieve possible losses incurred by the uncertainties in the system. To tackle this problem, we propose an efficient bilateral negotiation approach which facilitates the participants reaching a consensus. Extensive simulations demonstrate that the proposed futures-based resource trading brings considerable utilities to both participants, while significantly outperforming the baseline methods on critical factors, e.g., trading failures and fairness, negotiation latency and cost.
274 - Chung-Hao Huang 2012
Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.
495 - Lei Ni , Aaron Harwood 2007
Volunteer Computing, sometimes called Public Resource Computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a Volunteer Computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a Peer-to- Peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely de- centralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا