Neural Random-Access Machines

118 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Karol Kurach

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Karol Kurach - Marcin Andrychowicz - Ilya Sutskever

التعلم الآلي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure input-output examples using backpropagation. We evaluate the new model on a number of simple algorithmic tasks whose solutions require pointer manipulation and dereferencing. Our results show that the proposed model can learn to solve algorithmic tasks of such type and is capable of operating on simple data structures like linked-lists and binary trees. For easier tasks, the learned solutions generalize to sequences of arbitrary length. Moreover, memory access during inference can be done in a constant time under some assumptions.

قيم البحث

119 - Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi 2020

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than their native parameter space. While such training is promising for more efficient and scalable optimization schemes, its practical application is limited by inferior optimization performance. Here, we improve on recent random subspace approaches as follows: Firstly, we show that keeping the random projection fixed throughout training is detrimental to optimization. We propose re-drawing the random subspace at each step, which yields significantly better performance. We realize further improvements by applying independent projections to different parts of the network, making the approximation more efficient as network dimensionality grows. To implement these experiments, we leverage hardware-accelerated pseudo-random number generation to construct the random projections on-demand at every optimization step, allowing us to distribute the computation of independent random directions across multiple workers with shared random seeds. This yields significant reductions in memory and is up to 10 times faster for the workloads in question.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

85 - Maxwell Mbabilla Aladago , Lorenzo Torresani 2021

In contrast to traditional weight optimization in a continuous space, we demonstrate the existence of effective random networks whose weights are never updated. By selecting a weight among a fixed set of random values for each individual connection, our method uncovers combinations of random weights that match the performance of traditionally-trained networks of the same capacity. We refer to our networks as slot machines where each reel (connection) contains a fixed set of symbols (random values). Our backpropagation algorithm spins the reels to seek winning combinations, i.e., selections of random weight values that minimize the given loss. Quite surprisingly, we find that allocating just a few random values to each connection (e.g., 8 values per connection) yields highly competitive combinations despite being dramatically more constrained compared to traditionally learned weights. Moreover, finetuning these combinations often improves performance over the trained baselines. A randomly initialized VGG-19 with 8 values per connection contains a combination that achieves 91% test accuracy on CIFAR-10. Our method also achieves an impressive performance of 98.2% on MNIST for neural networks containing only random weights.

التعلم الآلي الذكاء الاصطناعي

Neural Attentive Multiview Machines

59 - Oren Barkan , Ori Katz , Noam Koenigstein 2020

An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representa tions and similarity by employing a novel attention mechanism. NAM harnesses multiple information sources and automatically quantifies their relevancy with respect to a supervised task. Finally, a very practical advantage of NAM is its robustness to the case of dataset with missing views. We demonstrate the effectiveness of NAM for the task of movies and app recommendations. Our evaluations indicate that NAM outperforms single view models as well as alternative multiview methods on item recommendations tasks, including cold-start scenarios.

التعلم الآلي استرجاع المعلومات التعلم الالي

Dissecting Neural ODEs

115 - Stefano Massaroli , Michael Poli , Jinkyoo Park 2020

Continuous deep learning architectures have recently re-emerged as Neural Ordinary Differential Equations (Neural ODEs). This infinite-depth approach theoretically bridges the gap between deep learning and dynamical systems, offering a novel perspect ive. However, deciphering the inner working of these models is still an open challenge, as most applications apply them as generic black-box modules. In this work we open the box, further developing the continuous-depth formulation with the aim of clarifying the influence of several design choices on the underlying dynamics.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Binarized Neural Networks

172 - Itay Hubara , Daniel Soudry , Ran El Yaniv 2016

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time and when computing the parameters gradient at train-time. We conduct two sets of experiments, each based on a different framework, namely Torch7 and Theano, where we train BNNs on MNIST, CIFAR-10 and SVHN, and achieve nearly state-of-the-art results. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which might lead to a great increase in power-efficiency. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available.

التعلم الآلي الحوسبة العصبية والتطورية