ﻻ يوجد ملخص باللغة العربية
What makes untrained deep neural networks (DNNs) different from the trained performant ones? By zooming into the weights in well-trained DNNs, we found it is the location of weights that hold most of the information encoded by the training. Motivated by this observation, we hypothesize that weights in stochastic gradient-based method trained DNNs can be separated into two dimensions: the locations of weights and their exact values. To assess our hypothesis, we propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights. We empirically demonstrate the versatility of LaPerm while producing extensive evidence to support our hypothesis: when the initial weights are random and dense, our method demonstrates speed and performance similar to or better than that of regular optimizers, e.g., Adam; when the initial weights are random and sparse (many zeros), our method changes the way neurons connect and reach accuracy comparable to that of a well-trained fully initialized network; when the initial weights share a single value, our method finds weight agnostic neural network with far better-than-chance accuracy.
ReLU neural-networks have been in the focus of many recent theoretical works, trying to explain their empirical success. Nonetheless, there is still a gap between current theoretical results and empirical observations, even in the case of shallow (on
In this paper, we provide a rigorous theoretical investigation of an online learning version of the Facility Location problem which is motivated by emerging problems in real-world applications. In our formulation, we are given a set of sites and an o
We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatio-temporal regions of missing data. We present a computational framework which r
As machine learning systems get widely adopted for high-stake decisions, quantifying uncertainty over predictions becomes crucial. While modern neural networks are making remarkable gains in terms of predictive accuracy, characterizing uncertainty ov
The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among it