ﻻ يوجد ملخص باللغة العربية
Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kernel methods. This observation has motivated the study of generalizations of kernel methods, whereby the RKHS norm -- which is equivalent to a weighted $ell_2$ norm -- is replaced by a weighted functional $ell_p$ norm, which we refer to as $mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is unclear. The kernel trick is not available and minimizing these norms requires to solve an infinite-dimensional convex problem. We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size. Hence, learning with $mathcal{F}_p$ norms is tractable in these cases. We introduce a proof technique based on uniform concentration in the dual, which can be of broader interest in the study of overparametrized models.
Adversarial attacks during the testing phase of neural networks pose a challenge for the deployment of neural networks in security critical settings. These attacks can be performed by adding noise that is imperceptible to humans on top of the origina
We study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. This problem is motivated from fast emerging studies on the saf
Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability, while being also compatible with two main operations: multiplications of two models (product
Kernel method has been developed as one of the standard approaches for nonlinear learning, which however, does not scale to large data set due to its quadratic complexity in the number of samples. A number of kernel approximation methods have thus be
We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $phi(cdot)$ parametrized