Individual Fairness for $k$-Clustering

172 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ali Vakilian

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sepideh Mahabadi - Ali Vakilian

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We give a local search based algorithm for $k$-median and $k$-means (and more generally for any $k$-clustering with $ell_p$ norm cost function) from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of $k$ random points are chosen from $P$ as centers, every point $xin P$ expects to have a center within radius $r(x)$. An individually fair clustering provides such a guarantee for every point $xin P$. This notion of fairness was introduced in [Jung et al., 2019] where they showed how to get an approximately feasible $k$-clustering with respect to this fairness condition. In this work, we show how to get a bicriteria approximation for fair $k$-clustering: The $k$-median ($k$-means) cost of our solution is within a constant factor of the cost of an optimal fair $k$-clustering, and our solution approximately satisfies the fairness condition (also within a constant factor). Further, we complement our theoretical bounds with empirical evaluation.

قيم البحث

150 - Debajyoti Kar , Sourav Medya , Debmalya Mandal 2021

Ensuring fairness in machine learning algorithms is a challenging and important task. We consider the problem of clustering a set of points while ensuring fairness constraints. While there have been several attempts to capture group fairness in the k -clustering problem, fairness at an individual level is not well-studied. We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. We then design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective as well as individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results validate that our algorithm produces lower clustering costs compared to existing algorithms while being competitive in individual fairness.

التعلم الآلي أجهزة الكمبيوتر والمجتمع بنى وهياكل البيانات والخوارزميات

Individual Fairness in Pipelines

186 - Cynthia Dwork , Christina Ilvento , Meena Jagadeesan 2020

It is well understood that a system built from individually fair components may not itself be individually fair. In this work, we investigate individual fairness under pipeline composition. Pipelines differ from ordinary sequential or repeated compos ition in that individuals may drop out at any stage, and classification in subsequent stages may depend on the remaining cohort of individuals. As an example, a company might hire a team for a new project and at a later point promote the highest performer on the team. Unlike other repeated classification settings, where the degree of unfairness degrades gracefully over multiple fair steps, the degree of unfairness in pipelines can be arbitrary, even in a pipeline with just two stages. Guided by a panoply of real-world examples, we provide a rigorous framework for evaluating different types of fairness guarantees for pipelines. We show that na{i}ve auditing is unable to uncover systematic unfairness and that, in order to ensure fairness, some form of dependence must exist between the design of algorithms at different stages in the pipeline. Finally, we provide constructions that permit flexibility at later stages, meaning that there is no need to lock in the entire pipeline at the time that the early stage is constructed.

أجهزة الكمبيوتر والمجتمع التعلم الآلي التعلم الالي

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

692 - Michael B. Cohen , Sam Elder , Cameron Musco 2014

We show how to approximate a data matrix $mathbf{A}$ with a much smaller sketch $mathbf{tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+epsilon)$ error. Importantly, this class of problem s includes $k$-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just $O(k)$ dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For $k$-means dimensionality reduction, we provide $(1+epsilon)$ relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover a good subspace for $bv{A}$, but can be used directly to compute this subspace. Finally, for $k$-means clustering, we show how to achieve a $(9+epsilon)$ approximation by Johnson-Lindenstrauss projecting data points to just $O(log k/epsilon^2)$ dimensions. This gives the first result that leverages the specific structure of $k$-means to achieve dimension independent of input size and sublinear in $k$.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

Approximation Algorithms for Socially Fair Clustering

179 - Yury Makarychev , Ali Vakilian 2021

We present an $(e^{O(p)} frac{log ell}{loglogell})$-approximation algorithm for socially fair clustering with the $ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $ell$ grou ps. The goal is to find a $k$-medians, $k$-means, or, more generally, $ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $sum_{u text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Ghadiri, Samadi, and Vempala [2021] and Abbasi, Bhaskara, and Venkatasubramanian [2021]. Our algorithm improves and generalizes their $O(ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $Omega(ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $Theta(frac{log ell}{loglogell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

A Center in Your Neighborhood: Fairness in Facility Location

84 - Christopher Jung , Sampath Kannan , Neil Lutz 2019

When selecting locations for a set of facilities, standard clustering algorithms may place unfair burden on some individuals and neighborhoods. We formulate a fairness concept that takes local population densities into account. In particular, given $ k$ facilities to locate and a population of size $n$, we define the neighborhood radius of an individual $i$ as the minimum radius of a ball centered at $i$ that contains at least $n/k$ individuals. Our objective is to ensure that each individual has a facility within at most a small constant factor of her neighborhood radius. We present several theoretical results: We show that optimizing this factor is NP-hard; we give an approximation algorithm that guarantees a factor of at most 2 in all metric spaces; and we prove matching lower bounds in some metric spaces. We apply a variant of this algorithm to real-world address data, showing that it is quite different from standard clustering algorithms and outperforms them on our objective function and balances the load between facilities more evenly.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي