مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

A network that learns Strassen multiplication

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Veit Elser

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Veit Elser

التحليل العددي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study neural networks whose only non-linear components are multipliers, to test a new training rule in a context where the precise representation of data is paramount. These networks are challenged to discover the rules of matrix multiplication, given many examples. By limiting the number of multipliers, the network is forced to discover the Strassen multiplication rules. This is the mathematical equivalent of finding low rank decompositions of the $ntimes n$ matrix multiplication tensor, $M_n$. We train these networks with the conservative learning rule, which makes minimal changes to the weights so as to give the correct output for each input at the time the input-output pair is received. Conservative learning needs a few thousand examples to find the rank 7 decomposition of $M_2$, and $10^5$ for the rank 23 decomposition of $M_3$ (the lowest known). High precision is critical, especially for $M_3$, to discriminate between true decompositions and border approximations.

قيم البحث

313 - Viviana Arrigoni , Annalisa Massini 2019

Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$scriptstyle mathsf{T}$A, calls c lassical Strassens algorithm as sub-routine, decreasing the computational cost %(expressed in number of performed products) of the conventional $A^t A$ multiplication to $frac{2}{7}n^{log_2 7}$. It works for generic rectangular matrices and exploits the peculiar symmetry of the resulting product matrix for sparing memory. We used the MPI paradigm to implement A$scriptstyle mathsf{T}$A in parallel, and we tested its performances on a small subset of nodes of the Galileo cluster. Experiments highlight good scalability and speed-up, also thanks to minimal number of exchanged messages in the designed communication system. Parallel overhead and inherently sequential time fraction are negligible in the tested configurations.

النظم الموزعة والتوازية والحوسبة العنقودية البرمجيات الرياضية

Memory efficient scheduling of Strassen-Winograds matrix multiplication algorithm

114 - Brice Boyer 2009

We propose several new schedules for Strassen-Winograds matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using a first recursive level of classical multiplication. In particular, we show two fully in-place schedules: one having the same number of operations, if the input matrices can be overwritten; the other one, slightly increasing the constant of the leading term of the complexity, if the input matrices are read-only. Many of these schedules have been found by an implementation of an exhaustive search algorithm based on a pebble game.

البرمجيات الرياضية

SGD on Neural Networks Learns Functions of Increasing Complexity

69 - Preetum Nakkiran , Gal Kaplun , Dimitris Kalimeris 2019

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is retained throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems

237 - Xuhui Meng , George Em Karniadakis 2019

We propose a new composite neural network (NN) that can be trained based on multi-fidelity data. It is comprised of three NNs, with the first NN trained using the low-fidelity data and coupled to two high-fidelity NNs, one with activation functions a nd another one without, in order to discover and exploit nonlinear and linear correlations, respectively, between the low-fidelity and the high-fidelity data. We first demonstrate the accuracy of the new multi-fidelity NN for approximating some standard benchmark functions but also a 20-dimensional function. Subsequently, we extend the recently developed physics-informed neural networks (PINNs) to be trained with multi-fidelity data sets (MPINNs). MPINNs contain four fully-connected neural networks, where the first one approximates the low-fidelity data, while the second and third construct the correlation between the low- and high-fidelity data and produce the multi-fidelity approximation, which is then used in the last NN that encodes the partial differential equations (PDEs). Specifically, in the two high-fidelity NNs a relaxation parameter is introduced, which can be optimized to combine the linear and nonlinear sub-networks. By optimizing this parameter, the present model is capable of learning both the linear and complex nonlinear correlations between the low- and high-fidelity data adaptively. By training the MPINNs, we can:(1) obtain the correlation between the low- and high-fidelity data, (2) infer the quantities of interest based on a few scattered data, and (3) identify the unknown parameters in the PDEs. In particular, we employ the MPINNs to learn the hydraulic conductivity field for unsaturated flows as well as the reactive models for reactive transport. The results demonstrate that MPINNs can achieve relatively high accuracy based on a very small set of high-fidelity data.

الفيزياء الحسابية

MinConvNets: A new class of multiplication-less Neural Networks

69 - Xuecan Yang , Sumanta Chaudhuri , Laurence Likforman 2021

Convolutional Neural Networks have achieved unprecedented success in image classification, recognition, or detection applications. However, their large-scale deployment in embedded devices is still limited by the huge computational requirements, i.e. , millions of MAC operations per layer. In this article, MinConvNets where the multiplications in the forward propagation are approximated by minimum comparator operations are introduced. Hardware implementation of minimum operation is much simpler than multipliers. Firstly, a methodology to find approximate operations based on statistical correlation is presented. We show that it is possible to replace multipliers by minimum operations in the forward propagation under certain constraints, i.e. given similar mean and variances of the feature and the weight vectors. A modified training method which guarantees the above constraints is proposed. And it is shown that equivalent precision can be achieved during inference with MinConvNets by using transfer learning from well trained exact CNNs.

التعلم الآلي الحوسبة العصبية والتطورية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A network that learns Strassen multiplication

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً