ترغب بنشر مسار تعليمي؟ اضغط هنا

MXDAG: A Hybrid Abstraction for Cluster Applications

133   0   0.0 ( 0 )
 نشر من قبل Weitao Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and interactions between these two types of tasks, it is sub-optimal to only consider one of them. We argue that co-scheduling of both compute and network tasks can help applications towards the globally optimal end-to-end performance. However, none of the existing abstractions can provide fine-grained information for co-scheduling. We propose MXDAG, an abstraction to treat both compute and network tasks explicitly. It can capture the dependencies and interactions of both compute and network tasks leading to improved application performance.

قيم البحث

اقرأ أيضاً

Ad-hoc networks, a promising trend in wireless technology, fail to work properly in a global setting. In most cases, self-organization and cost-free local communication cannot compensate the need for being connected, gathering urgent information just -in-time. Equipping mobile devices additionally with GSM or UMTS adapters in order to communicate with arbitrary remote devices or even a fixed network infrastructure provides an opportunity. Devices that operate as intermediate nodes between the ad-hoc network and a reliable backbone network are potential injection points. They allow disseminating received information within the local neighborhood. The effectiveness of different devices to serve as injection point differs substantially. For practical reasons the determination of injection points should be done locally, within the ad-hoc network partitions. We analyze different localized algorithms using at most 2-hop neighboring information. Results show that devices selected this way spread information more efficiently through the ad-hoc network. Our results can also be applied in order to support the election process for clusterheads in the field of clustering mechanisms.
97 - Z. Akbar , L.T. Handoko 2009
An architecture to enable some blocks consisting of several nodes in a public cluster connected to different grid collaborations is introduced. It is realized by inserting a web-service in addition to the standard Globus Toolkit. The new web-service performs two main tasks : authenticate the digital certificate contained in an incoming requests and forward it to the designated block. The appropriate block is mapped with the username of the blocks owner contained in the digital certificate. It is argued that this algorithm opens an opportunity for any blocks in a public cluster to join various global grids.
84 - Ying Mao , Yuqi Fu , Wenjia Zheng 2020
In the past decade, we have witnessed a dramatically increasing volume of data collected from varied sources. The explosion of data has transformed the world as more information is available for collection and analysis than ever before. To maximize t he utilization, various machine and deep learning models have been developed, e.g. CNN [1] and RNN [2], to study data and extract valuable information from different perspectives. While data-driven applications improve countless products, training models for hyperparameter tuning is still a time-consuming and resource-intensive process. Cloud computing provides infrastructure support for the training of deep learning applications. The cloud service providers, such as Amazon Web Services [3], create an isolated virtual environment (virtual machines and containers) for clients, who share physical resources, e.g., CPU and memory. On the cloud, resource management schemes are implemented to enable better sharing among users and boost the system-wide performance. However, general scheduling approaches, such as spread priority and balanced resource schedulers, do not work well with deep learning workloads. In this project, we propose SpeCon, a novel container scheduler that is optimized for shortlived deep learning applications. Based on virtualized containers, such as Kubernetes [4] and Docker [5], SpeCon analyzes the common characteristics of training processes. We design a suite of algorithms to monitor the progress of the training and speculatively migrate the slow-growing models to release resources for fast-growing ones. Specifically, the extensive experiments demonstrate that SpeCon improves the completion time of an individual job by up to 41.5%, 14.8% system-wide and 24.7% in terms of makespan.
In the era of data explosion, a growing number of data-intensive computing frameworks, such as Apache Hadoop and Spark, have been proposed to handle the massive volume of unstructured data in parallel. Since programming models provided by these frame works allow users to specify complex and diversified user-defined functions (UDFs) with predefined operations, the grand challenge of tuning up entire system performance arises if programmers do not fully understand the semantics of code, data, and runtime systems. In this paper, we design a holistic semantics-aware optimization for data-intensive applications using hybrid program analysis} (SODA) to assist programmers to tune performance issues. SODA is a two-phase framework: the offline phase is a static analysis that analyzes code and performance profiling data from the online phase of prior executions to generate a parameterized and instrumented application; the online phase is a dynamic analysis that keeps track of the applications execution and collects runtime information of data and system. Extensive experimental results on four real-world Spark applications show that SODA can gain up to 60%, 10%, 8%, faster than its original implementation, with the three proposed optimization strategies, i.e., cache management, operation reordering, and element pruning, respectively.
MPI applications matter. However, with the advent of many-core processors, traditional MPI applications are challenged to achieve satisfactory performance. This is due to the inability of these applications to respond to load imbalances, to reduce se rialization imposed by synchronous communication patterns, to overlap communication with computation and finally to deal with increasing memory overheads. The MPI specification provides asynchronous calls to mitigate some of these factors. However, application developers rarely make the effort to apply them efficiently. In this work, we present a methodology to develop hybrid applications called Hierarchical Domain Over-decomposition with Tasking (HDOT), that reduces programming effort by emphasizing the reuse of data partition schemes from process-level and applying them on task-level, allowing a top-down approach to express concurrency and allowing a natural coexistence between MPI and shared-memory programming models. Our integration of MPI and OmpSs-2 shows promising results in terms of programmability and performance measured on a set of applications.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا