Do you want to publish a course? Click here

Robust Subspace Recovery with Adversarial Outliers

159   0   0.0 ( 0 )
 Added by Tyler Maunu
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We study the problem of robust subspace recovery (RSR) in the presence of adversarial outliers. That is, we seek a subspace that contains a large portion of a dataset when some fraction of the data points are arbitrarily corrupted. We first examine a theoretical estimator that is intractable to calculate and use it to derive information-theoretic bounds of exact recovery. We then propose two tractable estimators: a variant of RANSAC and a simple relaxation of the theoretical estimator. The two estimators are fast to compute and achieve state-of-the-art theoretical performance in a noiseless RSR setting with adversarial outliers. The former estimator achieves better theoretical guarantees in the noiseless case, while the latter estimator is robust to small noise, and its guarantees significantly improve with non-adversarial models of outliers. We give a complete comparison of guarantees for the adversarial RSR problem, as well as a short discussion on the estimation of affine subspaces.



rate research

Read More

120 - Chieh-Hsin Lai , Dongmian Zou , 2019
We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and removes outliers that lie away from this subspace. It is used within an autoencoder. The encoder maps the data into a latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a manifold close to the original inliers. Inliers and outliers are distinguished according to the distances between the original and mapped positions (small for inliers and large for outliers). Extensive numerical experiments with both image and document datasets demonstrate state-of-the-art precision and recall.
The subspace approximation problem with outliers, for given $n$ points in $d$ dimensions $x_{1},ldots, x_{n} in R^{d}$, an integer $1 leq k leq d$, and an outlier parameter $0 leq alpha leq 1$, is to find a $k$-dimensional linear subspace of $R^{d}$ that minimizes the sum of squared distances to its nearest $(1-alpha)n$ points. More generally, the $ell_{p}$ subspace approximation problem with outliers minimizes the sum of $p$-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the $(1-alpha)n$ inliers in the optimal solution are promised to lie exactly on a $k$-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal $k$-dimensional subspace summed over the optimal $(1-alpha)n$ inliers is at least $delta$ times its squared-error summed over all $n$ points, for some $0 < delta leq 1 - alpha$. With this assumption, we give an efficient algorithm to find a subset of $poly(k/epsilon) log(1/delta) loglog(1/delta)$ points whose span contains a $k$-dimensional subspace that gives a multiplicative $(1+epsilon)$-approximation to the optimal solution. The running time of our algorithm is linear in $n$ and $d$. Interestingly, our results hold even when the fraction of outliers $alpha$ is large, as long as the obvious condition $0 < delta leq 1 - alpha$ is satisfied.
144 - Zixiu Wang , Yiwen Guo , Hu Ding 2021
In this big data era, we often confront large-scale data in many machine learning tasks. A common approach for dealing with large-scale data is to build a small summary, {em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers and most existing coreset construction methods are not resilient against outliers (in particular, the outliers can be located arbitrarily in the space by an adversarial attacker). In this paper, we propose a novel robust coreset method for the {em continuous-and-bounded learning} problem (with outliers) which includes a broad range of popular optimization objectives in machine learning, like logistic regression and $ k $-means clustering. Moreover, our robust coreset can be efficiently maintained in fully-dynamic environment. To the best of our knowledge, this is the first robust and fully-dynamic coreset construction method for these optimization problems. We also conduct the experiments to evaluate the effectiveness of our robust coreset in practice.
We propose a distributionally robust classification model with a fairness constraint that encourages the classifier to be fair in view of the equality of opportunity criterion. We use a type-$infty$ Wasserstein ambiguity set centered at the empirical distribution to model distributional uncertainty and derive a conservative reformulation for the worst-case equal opportunity unfairness measure. We establish that the model is equivalent to a mixed binary optimization problem, which can be solved by standard off-the-shelf solvers. To improve scalability, we further propose a convex, hinge-loss-based model for large problem instances whose reformulation does not incur any binary variables. Moreover, we also consider the distributionally robust learning problem with a generic ground transportation cost to hedge against the uncertainties in the label and sensitive attribute. Finally, we numerically demonstrate that our proposed approaches improve fairness with negligible loss of predictive accuracy.
It is well-known that simple short-sighted algorithms, such as gradient descent, generalize well in the over-parameterized learning tasks, due to their implicit regularization. However, it is unknown whether the implicit regularization of these algorithms can be extended to robust learning tasks, where a subset of samples may be grossly corrupted with noise. In this work, we provide a positive answer to this question in the context of robust matrix recovery problem. In particular, we consider the problem of recovering a low-rank matrix from a number of linear measurements, where a subset of measurements are corrupted with large noise. We show that a simple sub-gradient method converges to the true low-rank solution efficiently, when it is applied to the over-parameterized l1-loss function without any explicit regularization or rank constraint. Moreover, by building upon a new notion of restricted isometry property, called sign-RIP, we prove the robustness of the sub-gradient method against outliers in the over-parameterized regime. In particular, we show that, with Gaussian measurements, the sub-gradient method is guaranteed to converge to the true low-rank solution, even if an arbitrary fraction of the measurements are grossly corrupted with noise.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا