ﻻ يوجد ملخص باللغة العربية
Straggler task detection is one of the main challenges in applying MapReduce for parallelizing and distributing large-scale data processing. It is defined as detecting running tasks on weak nodes. Considering two stages in the Map phase copy, combine and three stages of Reduce shuffle, sort and reduce, the total execution time is the total sum of the execution time of these five stages. Estimating the correct execution time in each stage that results in correct total execution time is the primary purpose of this paper. The proposed method is based on the application of a backpropagation Neural Network NN on the Hadoop for the detection of straggler tasks, to estimate the remaining execution time of tasks that is very important in straggler task detection. Results achieved have been compared with popular algorithms in this domain such as LATE, ESAMR and the real remaining time for WordCount and Sort benchmarks, and shown able to detect straggler tasks and estimate execution time accurately. Besides, it supports to accelerate task execution time.
Faced with continuously increasing scale of data, original back-propagation neural network based machine learning algorithm presents two non-trivial challenges: huge amount of data makes it difficult to maintain both efficiency and accuracy; redundan
The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physic
With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays, due to its
Active network management (ANM) of electricity distribution networks include many complex stochastic sequential optimization problems. These problems need to be solved for integrating renewable energies and distributed storage into future electrical
Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and higher data rel