Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Maxwell Aladago

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maxwell Mbabilla Aladago - Lorenzo Torresani

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In contrast to traditional weight optimization in a continuous space, we demonstrate the existence of effective random networks whose weights are never updated. By selecting a weight among a fixed set of random values for each individual connection, our method uncovers combinations of random weights that match the performance of traditionally-trained networks of the same capacity. We refer to our networks as slot machines where each reel (connection) contains a fixed set of symbols (random values). Our backpropagation algorithm spins the reels to seek winning combinations, i.e., selections of random weight values that minimize the given loss. Quite surprisingly, we find that allocating just a few random values to each connection (e.g., 8 values per connection) yields highly competitive combinations despite being dramatically more constrained compared to traditionally learned weights. Moreover, finetuning these combinations often improves performance over the trained baselines. A randomly initialized VGG-19 with 8 values per connection contains a combination that achieves 91% test accuracy on CIFAR-10. Our method also achieves an impressive performance of 98.2% on MNIST for neural networks containing only random weights.

قيم البحث

اقرأ أيضاً

Neural Random-Access Machines

117 - Karol Kurach , Marcin Andrychowicz , Ilya Sutskever 2015

In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure input-output examples using backpropagation. We evaluate the new model on a number of simple algorithmic tasks whose solutions require pointer manipulation and dereferencing. Our results show that the proposed model can learn to solve algorithmic tasks of such type and is capable of operating on simple data structures like linked-lists and binary trees. For easier tasks, the learned solutions generalize to sequences of arbitrary length. Moreover, memory access during inference can be done in a constant time under some assumptions.

التعلم الآلي الحوسبة العصبية والتطورية

ReCU: Reviving the Dead Weights in Binary Neural Networks

85 - Zihan Xu , Mingbao Lin , Jianzhuang Liu 2021

Binary neural networks (BNNs) have received increasing attention due to their superior reductions of computation and memory. Most existing works focus on either lessening the quantization error by minimizing the gap between the full-precision weights and their binarization or designing a gradient approximation to mitigate the gradient mismatch, while leaving the dead weights untouched. This leads to slow convergence when training BNNs. In this paper, for the first time, we explore the influence of dead weights which refer to a group of weights that are barely updated during the training of BNNs, and then introduce rectified clamp unit (ReCU) to revive the dead weights for updating. We prove that reviving the dead weights by ReCU can result in a smaller quantization error. Besides, we also take into account the information entropy of the weights, and then mathematically analyze why the weight standardization can benefit BNNs. We demonstrate the inherent contradiction between minimizing the quantization error and maximizing the information entropy, and then propose an adaptive exponential scheduler to identify the range of the dead weights. By considering the dead weights, our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be available at https://github.com/z-hXu/ReCU.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

Memory Capacity of Neural Turing Machines with Matrix Representation

163 - Animesh Renanse , Rohitash Chandra , Alok Sharma 2021

It is well known that recurrent neural networks (RNNs) faced limitations in learning long-term dependencies that have been addressed by memory structures in long short-term memory (LSTM) networks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data and has the potential to provide better memory structures when compared to canonical neural networks that use vector representation. Neural Turing machines (NTMs) are novel RNNs that implement notion of programmable computers with neural network controllers to feature algorithms that have copying, sorting, and associative recall tasks. In this paper, we study the augmentation of memory capacity with a matrix representation of RNNs and NTMs (MatNTMs). We investigate if matrix representation has a better memory capacity than the vector representations in conventional neural networks. We use a probabilistic model of the memory capacity using Fisher information and investigate how the memory capacity for matrix representation networks are limited under various constraints, and in general, without any constraints. In the case of memory capacity without any constraints, we found that the upper bound on memory capacity to be $N^2$ for an $Ntimes N$ state matrix. The results from our experiments using synthetic algorithmic tasks show that MatNTMs have a better learning capacity when compared to its counterparts.

التعلم الآلي الذكاء الاصطناعي التعقيد الحسابي

Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)

193 - Tianyu Zhang , Lei Zhu , Qian Zhao 2019

Quantization of weights of deep neural networks (DNN) has proven to be an effective solution for the purpose of implementing DNNs on edge devices such as mobiles, ASICs and FPGAs, because they have no sufficient resources to support computation invol ving millions of high precision weights and multiply-accumulate operations. This paper proposes a novel method to compress vectors of high precision weights of DNNs to ternary vectors, namely a cosine similarity based target non-retraining ternary (TNT) compression method. Our method leverages cosine similarity instead of Euclidean distances as commonly used in the literature and succeeds in reducing the size of the search space to find optimal ternary vectors from 3N to N, where N is the dimension of target vectors. As a result, the computational complexity for TNT to find theoretically optimal ternary vectors is only O(N log(N)). Moreover, our experiments show that, when we ternarize models of DNN with high precision parameters, the obtained quantized models can exhibit sufficiently high accuracy so that re-training models is not necessary.

التعلم الآلي التعقيد الحسابي

On the Approximation Lower Bound for Neural Nets with Random Weights

118 - Sho Sonoda , Ming Li , Feilong Cao 2020

A random net is a shallow neural network where the hidden layer is frozen with random assignment and the output layer is trained by convex optimization. Using random weights for a hidden layer is an effective method to avoid the inevitable non-convex ity in standard gradient descent learning. It has recently been adopted in the study of deep learning theory. Here, we investigate the expressive power of random nets. We show that, despite the well-known fact that a shallow neural network is a universal approximator, a random net cannot achieve zero approximation error even for smooth functions. In particular, we prove that for a class of smooth functions, if the proposal distribution is compactly supported, then a lower bound is positive. Based on the ridgelet analysis and harmonic analysis for neural networks, the proof uses the Plancherel theorem and an estimate for the truncated tail of the parameter distribution. We corroborate our theoretical results with various simulation studies, and generally two main take-home messages are offered: (i) Not any distribution for selecting random weights is feasible to build a universal approximator; (ii) A suitable assignment of random weights exists but to some degree is associated with the complexity of the target function.

التعلم الآلي التعلم الالي