Train-by-Reconnect: Decoupling Locations of Weights from their Values

74 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yushi Qiu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Yushi Qiu - Reiji Suda

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

What makes untrained deep neural networks (DNNs) different from the trained performant ones? By zooming into the weights in well-trained DNNs, we found it is the location of weights that hold most of the information encoded by the training. Motivated by this observation, we hypothesize that weights in stochastic gradient-based method trained DNNs can be separated into two dimensions: the locations of weights and their exact values. To assess our hypothesis, we propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights. We empirically demonstrate the versatility of LaPerm while producing extensive evidence to support our hypothesis: when the initial weights are random and dense, our method demonstrates speed and performance similar to or better than that of regular optimizers, e.g., Adam; when the initial weights are random and sparse (many zeros), our method changes the way neurons connect and reach accuracy comparable to that of a well-trained fully initialized network; when the initial weights share a single value, our method finds weight agnostic neural network with far better-than-chance accuracy.

قيم البحث

اقرأ أيضاً

Decoupling Gating from Linearity

58 - Jonathan Fiat , Eran Malach , Shai Shalev-Shwartz 2019

ReLU neural-networks have been in the focus of many recent theoretical works, trying to explain their empirical success. Nonetheless, there is still a gap between current theoretical results and empirical observations, even in the case of shallow (on e hidden-layer) networks. For example, in the task of memorizing a random sample of size $m$ and dimension $d$, the best theoretical result requires the size of the network to be $tilde{Omega}(frac{m^2}{d})$, while empirically a network of size slightly larger than $frac{m}{d}$ is sufficient. To bridge this gap, we turn to study a simplified model for ReLU networks. We observe that a ReLU neuron is a product of a linear function with a gate (the latter determines whether the neuron is active or not), where both share a jointly trained weight vector. In this spirit, we introduce the Gated Linear Unit (GaLU), which simply decouples the linearity from the gating by assigning different vectors for each role. We show that GaLU networks allow us to get optimization and generalization results that are much stronger than those available for ReLU networks. Specifically, we show a memorization result for networks of size $tilde{Omega}(frac{m}{d})$, and improved generalization bounds. Finally, we show that in some scenarios, GaLU networks behave similarly to ReLU networks, hence proving to be a good choice of a simplified model.

التعلم الآلي التعلم الالي

Online Learning of Facility Locations

214 - Stephen Pasteris , Ting He , Fabio Vitale 2020

In this paper, we provide a rigorous theoretical investigation of an online learning version of the Facility Location problem which is motivated by emerging problems in real-world applications. In our formulation, we are given a set of sites and an o nline sequence of user requests. At each trial, the learner selects a subset of sites and then incurs a cost for each selected site and an additional cost which is the price of the users connection to the nearest site in the selected subset. The problem may be solved by an application of the well-known Hedge algorithm. This would, however, require time and space exponential in the number of the given sites, which motivates our design of a novel quasi-linear time algorithm for this problem, with good theoretical guarantees on its performance.

التعلم الآلي التعلم الالي

BlackBox: Generalizable Reconstruction of Extremal Values from Incomplete Spatio-Temporal Data

60 - Tomislav Ivek , Domagoj Vlah 2020

We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatio-temporal regions of missing data. We present a computational framework which r econstructs missing data using convolutional deep neural networks. Conditioned on incomplete data, we employ autoencoder-like models as multivariate conditional distributions from which possible reconstructions of the complete dataset are sampled using imputed noise. In order to mitigate bias introduced by any one particular model, a prediction ensemble is constructed to create the final distribution of extremal values. Our method does not rely on expert knowledge in order to accurately reproduce dynamic features of a complex oceanographic system with minimal assumptions. The obtained results promise reusability and generalization to other domains.

التعلم الآلي التعلم الالي

Projected BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights

169 - Melanie F. Pradier , Weiwei Pan , Jiayu Yao 2018

As machine learning systems get widely adopted for high-stake decisions, quantifying uncertainty over predictions becomes crucial. While modern neural networks are making remarkable gains in terms of predictive accuracy, characterizing uncertainty ov er the parameters of these models is challenging because of the high dimensionality and complex correlations of the network parameter space. This paper introduces a novel variational inference framework for Bayesian neural networks that (1) encodes complex distributions in high-dimensional parameter space with representations in a low-dimensional latent space, and (2) performs inference efficiently on the low-dimensional representations. Across a large array of synthetic and real-world datasets, we show that our method improves uncertainty characterization and model generalization when compared with methods that work directly in the parameter space.

التعلم الآلي التعلم الالي

Multicalibrated Partitions for Importance Weights

100 - Parikshit Gopalan , Omer Reingold , Vatsal Sharan 2021

The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among it s applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where $R$ and $P$ are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about $R$ and $P$ or on the importance weights themselves. In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C in mathcal{C}$, where $mathcal{C}$ may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets $C in mathcal{C}$, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets $C in mathcal{C}$. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right.

التعلم الآلي التعلم الالي