ﻻ يوجد ملخص باللغة العربية
We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $mathbf{x} mapsto max(0, mathbf{w} cdot mathbf{x})$ with $mathbf{w} in mathbb{S}^{n-1}$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade, and Mansour (2009) where the learner is given access to a distribution $cal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $cal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to any distribution on $mathbb{S}^{n-1}$ (the unit sphere in $n$ dimensions) and for any error parameter $epsilon = Omega(1/log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $epsilon$ must be $Omega(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs. Our techniques combine kernel methods and polynomial approximations with a dual-loss approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for convex piecewise-linear fitting and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.
Polynomial inequalities lie at the heart of many mathematical disciplines. In this paper, we consider the fundamental computational task of automatically searching for proofs of polynomial inequalities. We adopt the framework of semi-algebraic proof
We give an $n^{O(loglog n)}$-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over ${pm 1}^n$. Even in the realizable setting, the previous fastest runtime was $n^{O(log n)}$, a cons
The training of two-layer neural networks with nonlinear activation functions is an important non-convex optimization problem with numerous applications and promising performance in layerwise deep learning. In this paper, we develop exact convex opti
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dime
We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $xinmathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{star}(x) = a^{top}|W^{star}x|$, where $ainmathbb{R}^d$