The Tribes of Machine Learning and the Realm of Computer Architecture

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ayaz Akram

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ayaz Akram - Jason Lowe-Power

التعلم الآلي هندسة العتاد

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine learning techniques have influenced the field of computer architecture like many other fields. This paper studies how the fundamental machine learning techniques can be applied towards computer architecture problems. We also provide a detailed survey of computer architecture research that employs different machine learning methods. Finally, we present some future opportunities and the outstanding challenges that need to be overcome to exploit full potential of machine learning for computer architecture.

قيم البحث

135 - Lingda Li , Santosh Pandey , Thomas Flynn 2021

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic problems under investigation. This work describes a concerted ef fort, where machine learning (ML) is used to accelerate discrete-event simulation. First, an ML-based instruction latency prediction framework that accounts for both static instruction/architecture properties and dynamic execution context is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional simulators significantly.

هندسة العتاد التعلم الآلي

Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

176 - Hang Lu , Xin Wei , Ning Lin 2018

Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9% fraction in the overall weights of modern deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines.

التعلم الآلي هندسة العتاد

Apollo: Transferable Architecture Exploration

75 - Amir Yazdanbakhsh , Christof Angermueller , Berkin Akin 2021

The looming end of Moores Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimizatio n problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient and do not transfer knowledge between related optimizations tasks with different design constraints, such as area and/or latency budget, or neural architecture configurations. In this work, we propose a transferable architecture exploration framework, dubbed Apollo, that leverages recent advances in black-box function optimization for sample-efficient accelerator design. We use this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints. We show that our framework finds high reward design configurations (up to 24.6% speedup) more sample-efficiently than a baseline black-box optimization approach. We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements). This encouraging outcome portrays a promising path forward to facilitate generating higher quality accelerators.

التعلم الآلي هندسة العتاد

NAAS: Neural Accelerator Architecture Search

93 - Yujun Lin , Mengtian Yang , Song Han 2021

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE conne ctivities and compiler mappings. To tackle this challenge, we propose Neural Accelerator Architecture Search (NAAS) which holistically searches the neural network architecture, accelerator architecture, and compiler mapping in one optimization loop. NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural hyper-parameters.

التعلم الآلي هندسة العتاد

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

212 - Farah Fahim , Benjamin Hawks , Christian Herwig 2021

Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drasticall y improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

التعلم الآلي هندسة العتاد أجهزة الكشف الفيزيائية