ترغب بنشر مسار تعليمي؟ اضغط هنا

84 - Remi Gribonval 2020
We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper.The principle o f compressive statistical learning is to compress a training collection, in one pass, into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. We explicitly describe and analyze random feature functions which empirical averages preserve the needed information for compressive clustering and compressive Gaussian mixture modeling with fixed known variance, and establish sufficient sketch sizes given the problem dimensions.
We study the expressivity of deep neural networks. Measuring a networks complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complex ity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of skip connections does not change the resulting approximation spaces. We also discuss the role of the networks nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.
In this paper, we propose a way to combine two acceleration techniques for the $ell_{1}$-regularized least squares problem: safe screening tests, which allow to eliminate useless dictionary atoms; and the use of fast structured approximations of the dictionary matrix. To do so, we introduce a new family of screening tests, termed stable screening, which can cope with approximation errors on the dictionary atoms while keeping the safety of the test (i.e. zero risk of rejecting atoms belonging to the solution support). Some of the main existing screening tests are extended to this new framework. The proposed algorithm consists in using a coarser (but faster) approximation of the dictionary at the initial iterations and then switching to better approximations until eventually adopting the original dictionary. A systematic switching criterion based on the duality gap saturation and the screening ratio is derived.Simulation results show significant reductions in both computational complexity and execution times for a wide range of tested scenarios.
In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by a different generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability of the MPP. One way to quantify stability is by how much the Frobenius norm of a generalized inverse exceeds that of the MPP. In this paper we derive finite-size concentration bounds for the Frobenius norm of $ell^p$-minimal general inverses of iid Gaussian matrices, with $1 leq p leq 2$. For $p = 1$ we prove exponential concentration of the Frobenius norm of the sparse pseudoinverse; for $p = 2$, we get a similar concentration bound for the MPP. Our proof is based on the convex Gaussian min-max theorem, but unlike previous applications which give asymptotic results, we derive finite-size bounds.
The computational complexity of a problem arising in the context of sparse optimization is considered, namely, the projection onto the set of $k$-cosparse vectors w.r.t. some given matrix $Omeg$. It is shown that this projection problem is (strongly) NP-hard, even in the special cases in which the matrix $Omeg$ contains only ternary or bipolar coefficients. Interestingly, this is in contrast to the projection onto the set of $k$-sparse vectors, which is trivially solved by keeping only the $k$ largest coefficients.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا