ﻻ يوجد ملخص باللغة العربية
It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $widetilde{mathcal{O}}(e^{1/delta^2}+sqrt{n})$ neurons and $widetilde{mathcal{O}}(e^{1/delta^2}(d+sqrt{n})+n)$ weights, where $delta$ is the minimum distance between the points. In this work, we improve the dependence on $delta$ from exponential to almost linear, proving that $widetilde{mathcal{O}}(frac{1}{delta}+sqrt{n})$ neurons and $widetilde{mathcal{O}}(frac{d}{delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned feat
Compared with avid research activities of deep convolutional neural networks (DCNNs) in practice, the study of theoretical behaviors of DCNNs lags heavily behind. In particular, the universal consistency of DCNNs remains open. In this paper, we prove
We study the efficacy and efficiency of deep generative networks for approximating probability distributions. We prove that neural networks can transform a low-dimensional source distribution to a distribution that is arbitrarily close to a high-dime
The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance samp
We consider an Intelligent Reflecting Surface (IRS)-aided multiple-input single-output (MISO) system for downlink transmission. We compare the performance of Deep Reinforcement Learning (DRL) and conventional optimization methods in finding optimal p