Robust Meta-learning for Mixed Linear Regression with Small Batches

118 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Weihao Kong

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Weihao Kong - Raghav Somani - Sham Kakade

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with $Omega(k^{1/2})$ examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with $o(k^{1/2})$ examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(log k)$.

قيم البحث

432 - Weihao Kong , Raghav Somani , Zhao Song 2020

In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. These include data from medical image processing and robotic interaction. Even though each individual task can not be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of $k$ linear regressions, and identify sufficient conditions for such a graceful exchange to hold; The total number of examples necessary with only small data tasks scales similarly as when big data tasks are available. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of $tildeOmega(k^{3/2})$ medium data tasks each with $tildeOmega(k^{1/2})$ examples.

التعلم الآلي التعلم الالي

Memory-Sample Tradeoffs for Linear Regression with Small Error

227 - Vatsal Sharan , Aaron Sidford , Gregory Valiant 2019

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constra ints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)ldots,$ with $a_i$ drawn independently from a $d$-dimensional isotropic Gaussian, and where $b_i = langle a_i, xrangle + eta_i,$ for a fixed $x in mathbb{R}^d$ with $|x|_2 = 1$ and with independent noise $eta_i$ drawn uniformly from the interval $[-2^{-d/5},2^{-d/5}].$ We show that any algorithm with at most $d^2/4$ bits of memory requires at least $Omega(d log log frac{1}{epsilon})$ samples to approximate $x$ to $ell_2$ error $epsilon$ with probability of success at least $2/3$, for $epsilon$ sufficiently small as a function of $d$. In contrast, for such $epsilon$, $x$ can be recovered to error $epsilon$ with probability $1-o(1)$ with memory $Oleft(d^2 log(1/epsilon)right)$ using $d$ examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

التعلم الآلي التعلم الالي

Iterative Least Trimmed Squares for Mixed Linear Regression

82 - Yanyao Shen , Sujay Sanghavi 2019

Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions. We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity. Finally, we provide an ODE analysis for a gradient-descent variant of ILTS that has optimal time complexity. Our results provide initial theoretical evidence that iteratively fitting to the best subset of samples -- a potentially widely applicable idea -- can provably provide state of the art performance in bad training data settings.

التعلم الآلي التعلم الالي

Robust Policy Search for Robot Navigation with Stochastic Meta-Policies

61 - Javier Garcia-Barcos , Ruben Martinez-Cantin 2020

Bayesian optimization is an efficient nonlinear optimization method where the queries are carefully selected to gather information about the optimum location. Thus, in the context of policy search, it has been called active policy search. The main in gredients of Bayesian optimization for sample efficiency are the probabilistic surrogate model and the optimal decision heuristics. In this work, we exploit those to provide robustness to different issues for policy search algorithms. We combine several methods and show how their interaction works better than the sum of the parts. First, to deal with input noise and provide a safe and repeatable policy we use an improved version of unscented Bayesian optimization. Then, to deal with mismodeling errors and improve exploration we use stochastic meta-policies for query selection and an adaptive kernel. We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.

التعلم الآلي التعلم الالي

Uncertainty Prediction for Deep Sequential Regression Using Meta Models

93 - Jiri Navratil , Matthew Arnold , Benjamin Elder 2020

Generating high quality uncertainty estimates for sequential regression, particularly deep recurrent networks, remains a challenging and open problem. Existing approaches often make restrictive assumptions (such as stationarity) yet still perform poo rly in practice, particularly in presence of real world non-stationary signals and drift. This paper describes a flexible method that can generate symmetric and asymmetric uncertainty estimates, makes no assumptions about stationarity, and outperforms competitive baselines on both drift and non drift scenarios. This work helps make sequential regression more effective and practical for use in real-world applications, and is a powerful new addition to the modeling toolbox for sequential uncertainty quantification in general.

التعلم الآلي التعلم الالي