ترغب بنشر مسار تعليمي؟ اضغط هنا

Iterative Least Trimmed Squares for Mixed Linear Regression

83   0   0.0 ( 0 )
 نشر من قبل Yanyao Shen
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions. We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity. Finally, we provide an ODE analysis for a gradient-descent variant of ILTS that has optimal time complexity. Our results provide initial theoretical evidence that iteratively fitting to the best subset of samples -- a potentially widely applicable idea -- can provably provide state of the art performance in bad training data settings.



قيم البحث

اقرأ أيضاً

The problem of fitting experimental data to a given model function $f(t; p_1,p_2,dots,p_N)$ is conventionally solved numerically by methods such as that of Levenberg-Marquardt, which are based on approximating the Chi-squared measure of discrepancy b y a quadratic function. Such nonlinear iterative methods are usually necessary unless the function $f$ to be fitted is itself a linear function of the parameters $p_n$, in which case an elementary linear Least Squares regression is immediately available. When linearity is present in some, but not all, of the parameters, we show how to streamline the optimization method by reducing the nonlinear activity to the nonlinear parameters only. Numerical examples are given to demonstrate the effectiveness of this approach. The main idea is to replace entries corresponding to the linear terms in the numerical difference quotients with an optimal value easily obtained by linear regression. More generally, the idea applies to minimization problems which are quadratic in some of the parameters. We show that the covariance matrix of $chi^2$ remains the same even though the derivatives are calculated in a different way. For this reason, the standard non-linear optimization methods can be fully applied.
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. These include data from medical image processing and robotic interaction. Even though each individual task can not be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of $k$ linear regressions, and identify sufficient conditions for such a graceful exchange to hold; The total number of examples necessary with only small data tasks scales similarly as when big data tasks are available. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of $tildeOmega(k^{3/2})$ medium data tasks each with $tildeOmega(k^{1/2})$ examples.
In recent studies on sparse modeling, $l_q$ ($0<q<1$) regularized least squares regression ($l_q$LS) has received considerable attention due to its superiorities on sparsity-inducing and bias-reduction over the convex counterparts. In this paper, we propose a Gauss-Seidel iterative thresholding algorithm (called GAITA) for solution to this problem. Different from the classical iterative thresholding algorithms using the Jacobi updating rule, GAITA takes advantage of the Gauss-Seidel rule to update the coordinate coefficients. Under a mild condition, we can justify that the support set and sign of an arbitrary sequence generated by GAITA will converge within finite iterations. This convergence property together with the Kurdyka-{L}ojasiewicz property of ($l_q$LS) naturally yields the strong convergence of GAITA under the same condition as above, which is generally weaker than the condition for the convergence of the classical iterative thresholding algorithms. Furthermore, we demonstrate that GAITA converges to a local minimizer under certain additional conditions. A set of numerical experiments are provided to show the effectiveness, particularly, much faster convergence of GAITA as compared with the classical iterative thresholding algorithms.
A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However , by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with $Omega(k^{1/2})$ examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with $o(k^{1/2})$ examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(log k)$.
In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the evolution of train ing accuracy (as a function of training epochs) is different for clean and bad samples. Based on this we propose to iteratively minimize the trimmed loss, by alternating between (a) selecting samples with lowest current loss, and (b) retraining a model on only these samples. We prove that this process recovers the ground truth (with linear convergence rate) in generalized linear models with standard statistical assumptions. Experimentally, we demonstrate its effectiveness in three settings: (a) deep image classifiers with errors only in labels, (b) generative adversarial networks with bad training images, and (c) deep image classifiers with adversarial (image, label) pairs (i.e., backdoor attacks). For the well-studied setting of random label noise, our algorithm achieves state-of-the-art performance without having access to any a-priori guaranteed clean samples.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا