بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Analysis of the Energy-Performance Tradeoff for Delayed Mobile Offloading

89 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Huaming Wu

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Huaming Wu - Katinka Wolter

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper has been withdrawn by the author

قيم البحث

273 - Dewi Yokelson , Nikolay V. Tkachenko , Robert Robey 2021

Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. We explore using process placement and affinity to gain additional performance improvements. We also use statistical methods to understand performance changes in spite of the variability in runtime for each molecular dynamics timestep. We found ideal conditions for CPU runs included at least four MPI ranks per node, bound evenly across each socket, and fully utilizing processing cores with one OpenMP thread per core, no benefit was shown from reserving cores for the system. The CPU-only simulations scaled at 70% or more of the ideal scaling up to 10 compute nodes, after which the returns began to diminish more quickly. Simulations on a single 40-core node with two NVIDIA V100 GPUs for acceleration achieved over 3.7x speedup compared to the fastest single 36-core node CPU-only version, and showed 13% speedup over the fastest time we achieved across five CPU-only nodes.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

236 - Yihui Ren , Shinjae Yoo , Adolfy Hoisie 2019

This work examines the performance of leading-edge systems designed for machine learning computing, including the NVIDIA DGX-2, Amazon Web Services (AWS) P3, IBM Power System Accelerated Compute Server AC922, and a consumer-grade Exxact TensorEX TS4 GPU server. Representative deep learning workloads from the fields of computer vision and natural language processing are the focus of the analysis. Performance analysis is performed along with a number of important dimensions. Performance of the communication interconnects and large and high-throughput deep learning models are considered. Different potential use models for the systems as standalone and in the cloud also are examined. The effect of various optimization of the deep learning models and system configurations is included in the analysis.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Computation Peer Offloading for Energy-Constrained Mobile Edge Computing in Small-Cell Networks

139 - Lixing Chen , Sheng Zhou , Jie Xu 2017

The (ultra-)dense deployment of small-cell base stations (SBSs) endowed with cloud-like computing functionalities paves the way for pervasive mobile edge computing (MEC), enabling ultra-low latency and location-awareness for a variety of emerging mob ile applications and the Internet of Things. To handle spatially uneven computation workloads in the network, cooperation among SBSs via workload peer offloading is essential to avoid large computation latency at overloaded SBSs and provide high quality of service to end users. However, performing effective peer offloading faces many unique challenges in small cell networks due to limited energy resources committed by self-interested SBS owners, uncertainties in the system dynamics and co-provisioning of radio access and computing services. This paper develops a novel online SBS peer offloading framework, called OPEN, by leveraging the Lyapunov technique, in order to maximize the long-term system performance while keeping the energy consumption of SBSs below individual long-term constraints. OPEN works online without requiring information about future system dynamics, yet provides provably near-optimal performance compared to the oracle solution that has the complete future information. In addition, this paper formulates a novel peer offloading game among SBSs, analyzes its equilibrium and efficiency loss in terms of the price of anarchy in order to thoroughly understand SBSs strategic behaviors, thereby enabling decentralized and autonomous peer offloading decision making. Extensive simulations are carried out and show that peer offloading among SBSs dramatically improves the edge computing performance.

علوم الكمبيوتر ونظرية الألعاب النظم الموزعة والتوازية والحوسبة العنقودية

Brewing Analytics Quality for Cloud Performance

135 - Li Chen , Pooja Jain , Kingsum Chow 2015

Cloud computing has become increasingly popular. Many options of cloud deployments are available. Testing cloud performance would enable us to choose a cloud deployment based on the requirements. In this paper, we present an innovative process, imple mented in software, to allow us to assess the quality of the cloud performance data. The process combines performance data from multiple machines, spanning across user experience data, workload performance metrics, and readily available system performance data. Furthermore, we discuss the major challenges of bringing raw data into tidy data formats in order to enable subsequent analysis, and describe how our process has several layers of assessment to validate the quality of the data processing procedure. We present a case study to demonstrate the effectiveness of our proposed process, and conclude our paper with several future research directions worth investigating.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية تطبيقات الإحصاء

Chip-level and multi-node analysis of energy-optimized lattice-Boltzmann CFD simulations

652 - Markus Wittmann , Georg Hager , Thomas Zeiser 2013

Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice-Boltzmann method (LBM) on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip perf ormance and power characteristics can be generalized to the highly parallel case. First we perform an analysis of a sparse-lattice LBM implementation for complex geometries. Using a single-core performance model, we predict the intra-chip saturation characteristics and the optimal operating point in terms of energy to solution as a function of implementation details, clock frequency, vectorization, and number of active cores per chip. We show that high single-core performance and a correct choice of the number of active cores per chip are the essential optimizations for lowest energy to solution at minimal performance degradation. Then we extrapolate to the MPI-parallel level and quantify the energy-saving potential of various optimizations and execution modes, where we find these guidelines to be even more important, especially when communication overhead is non-negligible. In our setup we could achieve energy savings of 35% in this case, compared to a naive approach. We also demonstrate that a simple non-reflective reduction of the clock speed leaves most of the energy saving potential unused.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الإتحاد الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Analysis of the Energy-Performance Tradeoff for Delayed Mobile Offloading

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

This paper has been withdrawn by the author

اقرأ أيضاً