No Arabic abstract
We consider the problem of subset selection for $ell_{p}$ subspace approximation, i.e., given $n$ points in $d$ dimensions, we need to pick a small, representative subset of the given points such that its span gives $(1+epsilon)$ approximation to the best $k$-dimensional subspace that minimizes the sum of $p$-th powers of distances of all the points to this subspace. Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. Matrix sketching techniques give a single-pass $(1+epsilon)$ approximation for $ell_{p}$ subspace approximation but require additional passes for subset selection. In this work, we propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms based on adaptive sampling. For $p=2$, our algorithm gives subset selection of nearly optimal size in only $2$ passes, whereas the number of passes required in previous work depend on $k$. Our algorithm picks a subset of size $mathrm{poly}(k/epsilon)$ that gives $(1+epsilon)$ approximation to the optimal subspace. The running time of the algorithm is $nd + d~mathrm{poly}(k/epsilon)$. We extend our results to the case when outliers are present in the datasets, and suggest a two pass algorithm for the same. Our ideas also extend to give a reduction in the number of passes required by adaptive sampling algorithms for $ell_{p}$ subspace approximation and subset selection, for $p geq 2$.
The subspace approximation problem with outliers, for given $n$ points in $d$ dimensions $x_{1},ldots, x_{n} in R^{d}$, an integer $1 leq k leq d$, and an outlier parameter $0 leq alpha leq 1$, is to find a $k$-dimensional linear subspace of $R^{d}$ that minimizes the sum of squared distances to its nearest $(1-alpha)n$ points. More generally, the $ell_{p}$ subspace approximation problem with outliers minimizes the sum of $p$-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the $(1-alpha)n$ inliers in the optimal solution are promised to lie exactly on a $k$-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal $k$-dimensional subspace summed over the optimal $(1-alpha)n$ inliers is at least $delta$ times its squared-error summed over all $n$ points, for some $0 < delta leq 1 - alpha$. With this assumption, we give an efficient algorithm to find a subset of $poly(k/epsilon) log(1/delta) loglog(1/delta)$ points whose span contains a $k$-dimensional subspace that gives a multiplicative $(1+epsilon)$-approximation to the optimal solution. The running time of our algorithm is linear in $n$ and $d$. Interestingly, our results hold even when the fraction of outliers $alpha$ is large, as long as the obvious condition $0 < delta leq 1 - alpha$ is satisfied.
We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former is based on volumetric-logarithmic barrier introduced by Vaidya whereas the latter uses Johns ellipsoids. We show that the Vaidya walk mixes in significantly fewer steps than the logarithmic-barrier based Dikin walk studied in past work. For a polytope in $mathbb{R}^d$ defined by $n >d$ linear constraints, we show that the mixing time from a warm start is bounded as $mathcal{O}(n^{0.5}d^{1.5})$, compared to the $mathcal{O}(nd)$ mixing time bound for the Dikin walk. The cost of each step of the Vaidya walk is of the same order as the Dikin walk, and at most twice as large in terms of constant pre-factors. For the John walk, we prove an $mathcal{O}(d^{2.5}cdotlog^4(n/d))$ bound on its mixing time and conjecture that an improved variant of it could achieve a mixing time of $mathcal{O}(d^2cdottext{polylog}(n/d))$. Additionally, we propose variants of the Vaidya and John walks that mix in polynomial time from a deterministic starting point. The speed-up of the Vaidya walk over the Dikin walk are illustrated in numerical examples.
In this article, we analyse the Kantorovich type exponential sampling operators and its linear combination. We derive the Voronovskaya type theorem and its quantitative estimates for these operators in terms of an appropriate K-functional. Further, we improve the order of approximation by using the convex type linear combinations of these operators. Subsequently, we prove the estimates concerning the order of convergence for these linear combinations. Finally, we give some examples of kernels along with the graphical representations.
The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics based on these concepts, which can be applied together, separately, and in tandem with additional fairness constraints. Results from human subject experiments lend support to the proposed criteria. Social choice methods can additionally be leveraged to aggregate and choose preferable sets, and we detail how these may be applied.
For feature selection and related problems, we introduce the notion of classification game, a cooperative game, with features as players and hinge loss based characteristic function and relate a features contribution to Shapley value based error apportioning (SVEA) of total training error. Our major contribution is ($star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant or those features that span a subspace where the data is predominantly lying. In addition, our scheme ($star$) identifies the features on which Bayes classifier doesnt depend but any surrogate loss function based finite sample classifier does; this contributes to the excess $0$-$1$ risk of such a classifier, ($star$) estimates unknown true hinge risk of a feature, and ($star$) relate the stability property of an allocation and negative valued SVEA by designing the analogue of core of classification game. Due to Shapley values computationally expensive nature, we build on a known Monte Carlo based approximation algorithm that computes characteristic function (Linear Programs) only when needed. We address the potential sample bias problem in feature selection by providing interval estimates for SVEA values obtained from multiple sub-samples. We illustrate all the above aspects on various synthetic and real datasets and show that our scheme achieves better results than existing recursive feature elimination technique and ReliefF in most cases. Our theoretically grounded classification game in terms of well defined characteristic function offers interpretability (which we formalize in terms of final task) and explainability of our framework, including identification of important features.