ترغب بنشر مسار تعليمي؟ اضغط هنا

Almost Tight Approximation Algorithms for Explainable Clustering

69   0   0.0 ( 0 )
 نشر من قبل Shyam Narayanan
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al.~cite{dasgupta2020explainable}. Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds. First, we provide an $O(log k log log k)$-approximation algorithm for explainable $k$-medians, improving on the best known algorithm of $O(k)$~cite{dasgupta2020explainable} and nearly matching the known $Omega(log k)$ lower bound~cite{dasgupta2020explainable}. In addition, in low-dimensional spaces $d ll log k$, we show that our algorithm also provides an $O(d log^2 d)$-approximate solution for explainable $k$-medians. This improves over the best known bound of $O(d log k)$ for low dimensions~cite{laber2021explainable}, and is a constant for constant dimensional spaces. To complement this, we show a nearly matching $Omega(d)$ lower bound. Next, we study the $k$-means problem in this context and provide an $O(k log k)$-approximation algorithm for explainable $k$-means, improving over the $O(k^2)$ bound of Dasgupta et al. and the $O(d k log k)$ bound of cite{laber2021explainable}. To complement this we provide an almost tight $Omega(k)$ lower bound, improving over the $Omega(log k)$ lower bound of Dasgupta et al. Given an approximate solution to the classic $k$-means and $k$-medians, our algorithm for $k$-medians runs in time $O(kd log^2 k )$ and our algorithm for $k$-means runs in time $ O(k^2 d)$.



قيم البحث

اقرأ أيضاً

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, b oth players actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL-VTR based on the principle Optimism-in-Face-of-Uncertainty. Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally very efficient. Specifically, we show that Nash-UCRL-VTR can provably achieve an $tilde{O}(dHsqrt{T})$ regret, where $d$ is the linear function dimension, $H$ is the length of the game and $T$ is the total number of steps in the game. To access the optimality of our algorithm, we also prove an $tilde{Omega}( dHsqrt{T})$ lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.
We study a combinatorial problem called Minimum Maximal Matching, where we are asked to find in a general graph the smallest that can not be extended. We show that this problem is hard to approximate with a constant smaller than 2, assuming the Uniqu e Games Conjecture. As a corollary we show, that Minimum Maximal Matching in bipartite graphs is hard to approximate with constant smaller than $frac{4}{3}$, with the same assumption. With a stronger variant of the Unique Games Conjecture --- that is Small Set Expansion Hypothesis --- we are able to improve the hardness result up to the factor of $frac{3}{2}$.
We present an $(e^{O(p)} frac{log ell}{loglogell})$-approximation algorithm for socially fair clustering with the $ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $ell$ grou ps. The goal is to find a $k$-medians, $k$-means, or, more generally, $ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $sum_{u text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Ghadiri, Samadi, and Vempala [2021] and Abbasi, Bhaskara, and Venkatasubramanian [2021]. Our algorithm improves and generalizes their $O(ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $Omega(ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $Theta(frac{log ell}{loglogell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.
We consider the $k$-clustering problem with $ell_p$-norm cost, which includes $k$-median, $k$-means and $k$-center cost functions, under an individual notion of fairness proposed by Jung et al. [2020]: given a set of points $P$ of size $n$, a set of $k$ centers induces a fair clustering if for every point $vin P$, $v$ can find a center among its $n/k$ closest neighbors. Recently, Mahabadi and Vakilian [2020] showed how to get a $(p^{O(p)},7)$-bicriteria approximation for the problem of fair $k$-clustering with $ell_p$-norm cost: every point finds a center within distance at most $7$ times its distance to its $(n/k)$-th closest neighbor and the $ell_p$-norm cost of the solution is at most $p^{O(p)}$ times the cost of an optimal fair solution. In this work, for any $varepsilon>0$, we present an improved $(16^p +varepsilon,3)$-bicriteria approximation for the fair $k$-clustering with $ell_p$-norm cost. To achieve our guarantees, we extend the framework of [Charikar et al., 2002, Swamy, 2016] and devise a $16^p$-approximation algorithm for the facility location with $ell_p$-norm cost under matroid constraint which might be of an independent interest. Besides, our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al. [2019], which is essentially the median matroid problem [Krishnaswamy et al., 2011].

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا