Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Apollo: Transferable Architecture Exploration

76 0 0.0 ( 0 )

Download Cite

Added by Amir Yazdanbakhsh

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Amir Yazdanbakhsh - Christof Angermueller - Berkin Akin

Machine Learning Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The looming end of Moores Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimization problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient and do not transfer knowledge between related optimizations tasks with different design constraints, such as area and/or latency budget, or neural architecture configurations. In this work, we propose a transferable architecture exploration framework, dubbed Apollo, that leverages recent advances in black-box function optimization for sample-efficient accelerator design. We use this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints. We show that our framework finds high reward design configurations (up to 24.6% speedup) more sample-efficiently than a baseline black-box optimization approach. We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements). This encouraging outcome portrays a promising path forward to facilitate generating higher quality accelerators.

rate research

Learning Transferable Graph Exploration

133 - Hanjun Dai , Yujia Li , Chenglong Wang 2019

This paper considers the problem of efficient exploration of unseen environments, a key challenge in AI. We propose a `learning to explore framework where we learn a policy from a distribution of environments. At test time, presented with an unseen environment from the same distribution, the policy aims to generalize the exploration strategy to visit the maximum number of unique states in a limited number of steps. We particularly focus on environments with graph-structured state-spaces that are encountered in many important real-world applications like software testing and map building. We formulate this task as a reinforcement learning problem where the `exploration agent is rewarded for transitioning to previously unseen environment states and employ a graph-structured memory to encode the agents past trajectory. Experimental results demonstrate that our approach is extremely effective for exploration of spatial maps; and when applied on the challenging problems of coverage-guided software-testing of domain-specific programs and real-world mobile applications, it outperforms methods that have been hand-engineered by human experts.

Machine Learning Machine Learning

NAAS: Neural Accelerator Architecture Search

93 - Yujun Lin , Mengtian Yang , Song Han 2021

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE connectivities and compiler mappings. To tackle this challenge, we propose Neural Accelerator Architecture Search (NAAS) which holistically searches the neural network architecture, accelerator architecture, and compiler mapping in one optimization loop. NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural hyper-parameters.

Machine Learning Hardware Architecture

The Tribes of Machine Learning and the Realm of Computer Architecture

79 - Ayaz Akram , Jason Lowe-Power 2020

Machine learning techniques have influenced the field of computer architecture like many other fields. This paper studies how the fundamental machine learning techniques can be applied towards computer architecture problems. We also provide a detailed survey of computer architecture research that employs different machine learning methods. Finally, we present some future opportunities and the outstanding challenges that need to be overcome to exploit full potential of machine learning for computer architecture.

Machine Learning Hardware Architecture

Randomized Transferable Machine

87 - Pengfei Wei , Tze Yun Leong 2020

Feature-based transfer is one of the most effective methodologies for transfer learning. Existing studies usually assume that the learned new feature representation is truly emph{domain-invariant}, and thus directly train a transfer model $mathcal{M}$ on source domain. In this paper, we consider a more realistic scenario where the new feature representation is suboptimal and small divergence still exists across domains. We propose a new learning strategy with a transfer model called Randomized Transferable Machine (RTM). More specifically, we work on source data with the new feature representation learned from existing feature-based transfer methods. The key idea is to enlarge source training data populations by randomly corrupting source data using some noises, and then train a transfer model $widetilde{mathcal{M}}$ that performs well on all the corrupted source data populations. In principle, the more corruptions are made, the higher the probability of the target data can be covered by the constructed source populations, and thus better transfer performance can be achieved by $widetilde{mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We develop a marginalized solution with linear regression model and dropout noise. With a marginalization trick, we can train an RTM that is equivalently to training using infinite source noisy populations without truly conducting any corruption. More importantly, such an RTM has a closed-form solution, which enables very fast and efficient training. Extensive experiments on various real-world transfer tasks show that RTM is a promising transfer model.

Machine Learning Artificial Intelligence

Low-complexity Architecture for AR(1) Inference

143 - A. Borges Jr. , R. J. Cintra , D. F. G. Coelho 2020

In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. However, the hardware implementation of the introduced method presents considerable advantages in several relevant metrics, offering more than 95% reduction in dynamic power and doubling the maximum operating frequency when compared to the reference method.

Signal Processing Hardware Architecture Computation

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Apollo: Transferable Architecture Exploration

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions