Phase Transitions in Rate Distortion Theory and Deep Learning


Abstract in English

Rate distortion theory is concerned with optimally encoding a given signal class $mathcal{S}$ using a budget of $R$ bits, as $Rtoinfty$. We say that $mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $mathcal{O}(R^{-s})$ for encoding $mathcal{S}$; the supremal compression rate is denoted $s^ast(mathcal{S})$. Given a fixed coding scheme, there usually are elements of $mathcal{S}$ that are compressed at a higher rate than $s^ast(mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain nice signal classes $mathcal{S}$, a phase transition occurs: We construct a probability measure $mathbb{P}$ on $mathcal{S}$ such that for every coding scheme $mathcal{C}$ and any $s >s^ast(mathcal{S})$, the set of signals encoded with error $mathcal{O}(R^{-s})$ by $mathcal{C}$ forms a $mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Omega)$ for a bounded Lipschitz domain $Omega$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $finmathcal{S}$ can be encoded to within accuracy $varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $finmathcal{S}$ to within accuracy $varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary learning procedure. We show that for any $s >s^ast(mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the learning procedure, the probability of success is bounded from above by $minbig{1,2^{Ccdot Wlceillog_2(1+W)rceil^2 -ccdotvarepsilon^{-1/s}}big}$.

Download