Deep Fried Convnets

305 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zichao Yang

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zichao Yang - Marcin Moczulski - Misha Denil

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters. Reducing the number of parameters while preserving essentially the same predictive performance is critically important for operating deep neural networks in memory constrained environments such as GPUs or embedded devices. In this paper we show how kernel methods, in particular a single Fastfood layer, can be used to replace all fully connected layers in a deep convolutional neural network. This novel Fastfood layer is also end-to-end trainable in conjunction with convolutional layers, allowing us to combine them into a new architecture, named deep fried convolutional networks, which substantially reduces the memory footprint of convolutional networks trained on MNIST and ImageNet with no drop in predictive performance.

قيم البحث

916 - Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan 2015

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision f ixed-point computations, we observe the rounding scheme to play a crucial role in determining the networks behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

K-TanH: Efficient TanH For Deep Learning

371 - Abhisek Kundu , Alex Heinecke , Dhiraj Kalamkar 2019

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Learning Deep Representations with Probabilistic Knowledge Transfer

67 - Nikolaos Passalis , Anastasios Tefas 2018

Knowledge Transfer (KT) techniques tackle the problem of transferring the knowledge from a large and complex neural network into a smaller and faster one. However, existing KT methods are tailored towards classification tasks and they cannot be used efficiently for other representation learning tasks. In this paper a novel knowledge transfer technique, that is capable of training a student model that maintains the same amount of mutual information between the learned representation and a set of (possible unknown) labels as the teacher model, is proposed. Apart from outperforming existing KT techniques, the proposed method allows for overcoming several limitations of existing methods providing new insight into KT as well as novel KT applications, ranging from knowledge transfer from handcrafted feature extractors to {cross-modal} KT from the textual modality into the representation extracted from the visual modality of the data.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

A Pragmatic Look at Deep Imitation Learning

150 - Kai Arulkumaran , Dan Ogawa Lillrank 2021

The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the development of scalable imitation learning approaches using deep neural networks. The GAIL objective can be thought of as 1) matching the expert policy s state distribution; 2) penalising the learned policys state distribution; and 3) maximising entropy. While theoretically motivated, in practice GAIL can be difficult to apply, not least due to the instabilities of adversarial training. In this paper, we take a pragmatic look at GAIL and related imitation learning algorithms. We implement and automatically tune a range of algorithms in a unified experimental setup, presenting a fair evaluation between the competing methods. From our results, our primary recommendation is to consider non-adversarial methods. Furthermore, we discuss the common components of imitation learning objectives, and present promising avenues for future research.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Compiler-Level Matrix Multiplication Optimization for Deep Learning

119 - Huaqing Zhang , Xiaolin Cheng , Hui Zang 2019

An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level optimization of GEMM has significant performance impact on training and executing deep learning models. However, most deep learning frameworks rely on hardware-specific operator libraries in which GEMM optimization has been mostly achieved by manual tuning, which restricts the performance on different target hardware. In this paper, we propose two novel algorithms for GEMM optimization based on the TVM framework, a lightweight Greedy Best First Search (G-BFS) method based on heuristic search, and a Neighborhood Actor Advantage Critic (N-A2C) method based on reinforcement learning. Experimental results show significant performance improvement of the proposed methods, in both the optimality of the solution and the cost of search in terms of time and fraction of the search space explored. Specifically, the proposed methods achieve 24% and 40% savings in GEMM computation time over state-of-the-art XGBoost and RNN methods, respectively, while exploring only 0.1% of the search space. The proposed approaches have potential to be applied to other operator-level optimizations.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي