A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning

89 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Urmish Thakker

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Newsha Ardalani - Urmish Thakker - Aws Albarghouthi

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Porting code from CPU to GPU is costly and time-consuming; Unless much time is invested in development and optimization, it is not obvious, a priori, how much speed-up is achievable or how much room is left for improvement. Knowing the potential speed-up a priori can be very useful: It can save hundreds of engineering hours, help programmers with prioritization and algorithm selection. We aim to address this problem using machine learning in a supervised setting, using solely the single-threaded source code of the program, without having to run or profile the code. We propose a static analysis-based cross-architecture performance prediction framework (Static XAPP) which relies solely on program properties collected using static analysis of the CPU source code and predicts whether the potential speed-up is above or below a given threshold. We offer preliminary results that show we can achieve 94% accuracy in binary classification, in average, across different thresholds

قيم البحث

109 - Malte S. Kurz 2021

This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It a llows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype Python implementation texttt{DoubleML-Serverless} for the estimation of double machine learning models with the serverless computing platform AWS Lambda and demonstrate its utility with a case study analyzing estimation times and costs.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي التعلم الالي

Cross-architecture Tuning of Silicon and SiGe-based Quantum Devices Using Machine Learning

130 - B. Severin , D. T. Lennon , L. C. Camenzind 2021

The potential of Si and SiGe-based devices for the scaling of quantum circuits is tainted by device variability. Each device needs to be tuned to operation conditions. We give a key step towards tackling this variability with an algorithm that, witho ut modification, is capable of tuning a 4-gate Si FinFET, a 5-gate GeSi nanowire and a 7-gate SiGe heterostructure double quantum dot device from scratch. We achieve tuning times of 30, 10, and 92 minutes, respectively. The algorithm also provides insight into the parameter space landscape for each of these devices. These results show that overarching solutions for the tuning of quantum devices are enabled by machine learning.

الفيزياء ميسكالي وننكالي التعلم الآلي فيزياء الكم

Machine Learning for Performance Prediction of Spark Cloud Applications

83 - Alexandre Maros , Fabricio Murai , Ana Paula Couto da Silva andn Jussara M. Almeida 2021

Big data applications and analytics are employed in many sectors for a variety of goals: improving customers satisfaction, predicting market behavior or improving processes in public health. These applications consist of complex software stacks that are often run on cloud systems. Predicting execution times is important for estimating the cost of cloud services and for effectively managing the underlying resources at runtime. Machine Learning (ML), providing black box solutions to model the relationship between application performance and system configuration without requiring in-detail knowledge of the system, has become a popular way of predicting the performance of big data applications. We investigate the cost-benefits of using supervised ML models for predicting the performance of applications on Spark, one of todays most widely used frameworks for big data analysis. We compare our approach with textit{Ernest} (an ML-based technique proposed in the literature by the Spark inventors) on a range of scenarios, application workloads, and cloud system configurations. Our experiments show that Ernest can accurately estimate the performance of very regular applications, but it fails when applications exhibit more irregular patterns and/or when extrapolating on bigger data set sizes. Results show that our models match or exceed Ernests performance, sometimes enabling us to reduce the prediction error from 126-187% to only 5-19%.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء

SimNet: Computer Architecture Simulation using Machine Learning

135 - Lingda Li , Santosh Pandey , Thomas Flynn 2021

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic problems under investigation. This work describes a concerted ef fort, where machine learning (ML) is used to accelerate discrete-event simulation. First, an ML-based instruction latency prediction framework that accounts for both static instruction/architecture properties and dynamic execution context is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional simulators significantly.

هندسة العتاد التعلم الآلي

Performance Modeling and Analysis of a Hyperledger-based System Using GSPN

103 - Pu Yuan , Kan Zheng , Xiong Xiong 2020

As a highly scalable permissioned blockchain platform, Hyperledger Fabric supports a wide range of industry use cases ranging from governance to finance. In this paper, we propose a model to analyze the performance of a Hyperledgerbased system by usi ng Generalised Stochastic Petri Nets (GSPN). This model decomposes a transaction flow into multiple phases and provides a simulation-based approach to obtain the system latency and throughput with a specific arrival rate. Based on this model, we analyze the impact of different configurations of ordering service on system performance to find out the bottleneck. Moreover, a mathematical configuration selection approach is proposed to determine the best configuration which can maximize the system throughput. Finally, extensive experiments are performed on a running system to validate the proposed model and approaches.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء