بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Software Performance Analysis

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michel Dagenais

تاريخ النشر 2005

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michel R. Dagenais (Dept. of Computer Engineering - Ecolen Polytechnique - Montreal

الأداء أنظمة التشغيل

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The key to speeding up applications is often understanding where the elapsed time is spent, and why. This document reviews in depth the full array of performance analysis tools and techniques available on Linux for this task, from the traditional tools like gcov and gprof, to the more advanced tools still under development like oprofile and the Linux Trace Toolkit. The focus is more on the underlying data collection and processing algorithms, and their overhead and precision, than on the cosmetic details of the graphical user interface frontends.

قيم البحث

62 - Michel R. Dagenais (Dept. of Computer Engineering , Ecolen Polytechnique , Montreal 2005

Block devices in computer operating systems typically correspond to disks or disk partitions, and are used to store files in a filesystem. Disks are not the only real or virtual device which adhere to the block accessible stream of bytes block device model. Files, remote devices, or even RAM may be used as a virtual disks. This article examines several common combinations of block device layers used as virtual disks in the Linux operating system: disk partitions, loopback files, software RAID, Logical Volume Manager, and Network Block Devices. It measures their relative performance using different filesystems: Ext2, Ext3, ReiserFS, JFS, XFS,NFS.

الأداء أنظمة التشغيل

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

236 - Yihui Ren , Shinjae Yoo , Adolfy Hoisie 2019

This work examines the performance of leading-edge systems designed for machine learning computing, including the NVIDIA DGX-2, Amazon Web Services (AWS) P3, IBM Power System Accelerated Compute Server AC922, and a consumer-grade Exxact TensorEX TS4 GPU server. Representative deep learning workloads from the fields of computer vision and natural language processing are the focus of the analysis. Performance analysis is performed along with a number of important dimensions. Performance of the communication interconnects and large and high-throughput deep learning models are considered. Different potential use models for the systems as standalone and in the cloud also are examined. The effect of various optimization of the deep learning models and system configurations is included in the analysis.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Analysis of the Energy-Performance Tradeoff for Delayed Mobile Offloading

88 - Huaming Wu , Katinka Wolter 2015

This paper has been withdrawn by the author

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Lattice Boltzmann Benchmark Kernels as a Testbed for Performance Analysis

95 - Markus Wittmann , Viktor Haag , Thomas Zeiser 2017

Lattice Boltzmann methods (LBM) are an important part of current computational fluid dynamics (CFD). They allow easy implementations and boundary handling. However, competitive time to solution not only depends on the choice of a reasonable method, b ut also on an efficient implementation on modern hardware. Hence, performance optimization has a long history in the lattice Boltzmann community. A variety of options exists regarding the implementation with direct impact on the solver performance. Experimenting and evaluating each option often is hard as the kernel itself is typically embedded in a larger code base. With our suite of lattice Boltzmann kernels we provide the infrastructure for such endeavors. Already included are several kernels ranging from simple to fully optimized implementations. Although these kernels are not fully functional CFD solvers, they are equipped with a solid verification method. The kernels may act as an reference for performance comparisons and as a blue print for optimization strategies. In this paper we give an overview of already available kernels, establish a performance model for each kernel, and show a comparison of implementations and recent architectures.

الأداء

Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics

273 - Dewi Yokelson , Nikolay V. Tkachenko , Robert Robey 2021

Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. We explore using process placement and affinity to gain additional performance improvements. We also use statistical methods to understand performance changes in spite of the variability in runtime for each molecular dynamics timestep. We found ideal conditions for CPU runs included at least four MPI ranks per node, bound evenly across each socket, and fully utilizing processing cores with one OpenMP thread per core, no benefit was shown from reserving cores for the system. The CPU-only simulations scaled at 70% or more of the ideal scaling up to 10 compute nodes, after which the returns began to diminish more quickly. Simulations on a single 40-core node with two NVIDIA V100 GPUs for acceleration achieved over 3.7x speedup compared to the fastest single 36-core node CPU-only version, and showed 13% speedup over the fastest time we achieved across five CPU-only nodes.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الجزيرة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Software Performance Analysis

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً