Fair Clustering Under a Bounded Cost

140 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Seyed Esmaeili

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Seyed A. Esmaeili - Brian Brubach - Aravind Srinivasan

التعلم الآلي بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. In this model, the cost of the clustering objective increases due to enforcing fairness in the algorithm. The relative increase in the cost, the price of fairness, can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective. We derive fundamental lower bounds on the approximation of the utilitarian and egalitarian objectives and introduce algorithms with provable guarantees for them. For the leximin objective we introduce an effective heuristic algorithm. We further derive impossibility results for other natural fairness objectives. We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms.

قيم البحث

اقرأ أيضاً

Proportionally Fair Clustering

94 - Xingyu Chen , Brandon Fain , Liang Lyu 2019

We extend the fair machine learning literature by considering the problem of proportional centroid clustering in a metric context. For clustering $n$ points with $k$ centers, we define fairness as proportionality to mean that any $n/k$ points are ent itled to form their own cluster if there is another center that is closer in distance for all $n/k$ points. We seek clustering solutions to which there are no such justified complaints from any subsets of agents, without assuming any a priori notion of protected subsets. We present and analyze algorithms to efficiently compute, optimize, and audit proportional solutions. We conclude with an empirical examination of the tradeoff between proportional solutions and the $k$-means objective.

التعلم الآلي بنى وهياكل البيانات والخوارزميات علوم الكمبيوتر ونظرية الألعاب

A New Notion of Individually Fair Clustering: $alpha$-Equitable $k$-Center

102 - Darshan Chakrabarti , John P. Dickerson , Seyed A. Esmaeili 2021

Clustering is a fundamental problem in unsupervised machine learning, and fair variants of it have recently received significant attention. In this work we introduce a novel definition of fairness for clustering problems. Specifically, in our model e ach point $j$ has a set of other points $mathcal{S}_j$ that it perceives as similar to itself, and it feels that it is fairly treated, if the quality of service it receives in the solution is $alpha$-close to that of the points in $mathcal{S}_j$. We begin our study by answering questions regarding the structure of the problem, namely for what values of $alpha$ the problem is well-defined, and what the behavior of the Price of Fairness (PoF) for it is. For the well-defined region of $alpha$, we provide efficient and easily implementable approximation algorithms for the $k$-center objective, which in certain cases also enjoy bounded PoF guarantees. We finally complement our analysis by an extensive suite of experiments that validates the effectiveness of our theoretical results.

التعلم الآلي بنى وهياكل البيانات والخوارزميات

A Pairwise Fair and Community-preserving Approach to k-Center Clustering

163 - Brian Brubach , Darshan Chakrabarti , John P. Dickerson 2020

Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with di scrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Variational Fair Clustering

185 - Imtiaz Masud Ziko , Eric Granger , Jing Yuan 2019

We propose a general variational framework of fair clustering, which integrates an original Kullback-Leibler (KL) fairness term with a large class of clustering objectives, including prototype or graph based. Fundamentally different from the existing combinatorial and spectral solutions, our variational multi-term approach enables to control the trade-off levels between the fairness and clustering objectives. We derive a general tight upper bound based on a concave-convex decomposition of our fairness term, its Lipschitz-gradient property and the Pinskers inequality. Our tight upper bound can be jointly optimized with various clustering objectives, while yielding a scalable solution, with convergence guarantee. Interestingly, at each iteration, it performs an independent update for each assignment variable. Therefore, it can be easily distributed for large-scale datasets. This scalability is important as it enables to explore different trade-off levels between the fairness and clustering objectives. Unlike spectral relaxation, our formulation does not require computing its eigenvalue decomposition. We report comprehensive evaluations and comparisons with state-of-the-art methods over various fair-clustering benchmarks, which show that our variational formulation can yield highly competitive solutions in terms of fairness and clustering objectives.

التعلم الآلي التعلم الالي

Probabilistic Fair Clustering

123 - Seyed A. Esmaeili , Brian Brubach , Leonidas Tsepenekas 2020

In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., me mbership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of metric membership, where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically.

التعلم الآلي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات