Logarithmic Pruning is All You Need

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Omar Rivasplata

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Laurent Orseau - Marcus Hutter - Omar Rivasplata

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network. This latter result, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds:the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork.

قيم البحث

94 - Eran Malach , Gilad Yehudai , Shai Shalev-Shwartz 2020

The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network. We prove an even stronger hypothesis (as was also conjectured in Ramanujan et al., 2019), showing that for every bounded distribution and every target network with bounded weights, a sufficiently over-parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training.

التعلم الآلي التعلم الالي

Is Fast Adaptation All You Need?

299 - Khurram Javed , Hengshuai Yao , Martha White 2019

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples. The core idea behind these approaches is to use fast adaptation and gen eralization -- two second-order metrics -- as training signals on a meta-training dataset. However, little attention has been given to other possible second-order metrics. In this paper, we investigate a different training signal -- robustness to catastrophic interference -- and demonstrate that representations learned by directing minimizing interference are more conducive to incremental learning than those learned by just maximizing fast adaptation.

التعلم الآلي التعلم الالي

MemGEN: Memory is All You Need

133 - Sylvain Gelly , Karol Kurach , Marcin Michalski 2018

We propose a new learning paradigm called Deep Memory. It has the potential to completely revolutionize the Machine Learning field. Surprisingly, this paradigm has not been reinvented yet, unlike Deep Learning. At the core of this approach is the tex tit{Learning By Heart} principle, well studied in primary schools all over the world. Inspired by poem recitation, or by $pi$ decimal memorization, we propose a concrete algorithm that mimics human behavior. We implement this paradigm on the task of generative modeling, and apply to images, natural language and even the $pi$ decimals as long as one can print them as text. The proposed algorithm even generated this paper, in a one-shot learning setting. In carefully designed experiments, we show that the generated samples are indistinguishable from the training examples, as measured by any statistical tests or metrics.

التعلم الآلي

Categorical Representation Learning: Morphism is All You Need

68 - Artan Sheshmani , Yizhuang You 2021

We provide a construction for categorical representation learning and introduce the foundations of $textit{categorifier}$. The central theme in representation learning is the idea of $textbf{everything to vector}$. Every object in a dataset $mathcal{ S}$ can be represented as a vector in $mathbb{R}^n$ by an $textit{encoding map}$ $E: mathcal{O}bj(mathcal{S})tomathbb{R}^n$. More importantly, every morphism can be represented as a matrix $E: mathcal{H}om(mathcal{S})tomathbb{R}^{n}_{n}$. The encoding map $E$ is generally modeled by a $textit{deep neural network}$. The goal of representation learning is to design appropriate tasks on the dataset to train the encoding map (assuming that an encoding is optimal if it universally optimizes the performance on various tasks). However, the latter is still a $textit{set-theoretic}$ approach. The goal of the current article is to promote the representation learning to a new level via a $textit{category-theoretic}$ approach. As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the current deep learning models by 17 times. The content of the current article is part of the recent US patent proposal (patent application number: 63110906).

التعلم الآلي الأنظمة المضطربة والشبكات العصبية الذكاء الاصطناعي

Segmentation is All You Need

151 - Zehua Cheng , Yuxiang Wu , Zhenghua Xu 2019

Region proposal mechanisms are essential for existing deep learning approaches to object detection in images. Although they can generally achieve a good detection performance under normal circumstances, their recall in a scene with extreme cases is u nacceptably low. This is mainly because bounding box annotations contain much environment noise information, and non-maximum suppression (NMS) is required to select target boxes. Therefore, in this paper, we propose the first anchor-free and NMS-free object detection model called weakly supervised multimodal annotation segmentation (WSMA-Seg), which utilizes segmentation models to achieve an accurate and robust object detection without NMS. In WSMA-Seg, multimodal annotations are proposed to achieve an instance-aware segmentation using weakly supervised bounding boxes; we also develop a run-data-based following algorithm to trace contours of objects. In addition, we propose a multi-scale pooling segmentation (MSP-Seg) as the underlying segmentation model of WSMA-Seg to achieve a more accurate segmentation and to enhance the detection accuracy of WSMA-Seg. Experimental results on multiple datasets show that the proposed WSMA-Seg approach outperforms the state-of-the-art detectors.

الرؤية الحاسوبية وتمييز الأنماط