No Arabic abstract
Ensuring fairness in computational problems has emerged as a $key$ topic during recent years, buoyed by considerations for equitable resource distributions and social justice. It $is$ possible to incorporate fairness in computational problems from several perspectives, such as using optimization, game-theoretic or machine learning frameworks. In this paper we address the problem of incorporation of fairness from a $combinatorial$ $optimization$ perspective. We formulate a combinatorial optimization framework, suitable for analysis by researchers in approximation algorithms and related areas, that incorporates fairness in maximum coverage problems as an interplay between $two$ conflicting objectives. Fairness is imposed in coverage by using coloring constraints that $minimizes$ the discrepancies between number of elements of different colors covered by selected sets; this is in contrast to the usual discrepancy minimization problems studied extensively in the literature where (usually two) colors are $not$ given $a$ $priori$ but need to be selected to minimize the maximum color discrepancy of $each$ individual set. Our main results are a set of randomized and deterministic approximation algorithms that attempts to $simultaneously$ approximate both fairness and coverage in this framework.
We prove essentially tight lower bounds, conditionally to the Exponential Time Hypothesis, for two fundamental but seemingly very different cutting problems on surface-embedded graphs: the Shortest Cut Graph problem and the Multiway Cut problem. A cut graph of a graph $G$ embedded on a surface $S$ is a subgraph of $G$ whose removal from $S$ leaves a disk. We consider the problem of deciding whether an unweighted graph embedded on a surface of genus $g$ has a cut graph of length at most a given value. We prove a time lower bound for this problem of $n^{Omega(g/log g)}$ conditionally to ETH. In other words, the first $n^{O(g)}$-time algorithm by Erickson and Har-Peled [SoCG 2002, Discr. Comput. Geom. 2004] is essentially optimal. We also prove that the problem is W[1]-hard when parameterized by the genus, answering a 17-year old question of these authors. A multiway cut of an undirected graph $G$ with $t$ distinguished vertices, called terminals, is a set of edges whose removal disconnects all pairs of terminals. We consider the problem of deciding whether an unweighted graph $G$ has a multiway cut of weight at most a given value. We prove a time lower bound for this problem of $n^{Omega(sqrt{gt + g^2+t}/log(g+t))}$, conditionally to ETH, for any choice of the genus $gge0$ of the graph and the number of terminals $tge4$. In other words, the algorithm by the second author [Algorithmica 2017] (for the more general multicut problem) is essentially optimal; this extends the lower bound by the third author [ICALP 2012] (for the planar case). Reductions to planar problems usually involve a grid-like structure. The main novel idea for our results is to understand what structures instead of grids are needed if we want to exploit optimally a certain value $g$ of the genus.
Recently, various natural algorithmic problems have been shown to be $exists mathbb{R}$-complete. The reduction relied in many cases on the $exists mathbb{R}$-completeness of the problem ETR-INV, which served as a useful intermediate problem. Often some strengthening and modification of ETR-INV was required. This lead to a cluttered situation where no paper included all the previous details. Here, we give a streamlined exposition in a self-contained manner. We also explain and prove various universality results regarding ETR-INV. These notes should not be seen as a research paper with new results. However, we do describe some refinements of earlier results which might be useful for future research. We plan to extend and update this exposition as seems fit.
In this paper, we study the problem of fair classification in the presence of prior probability shifts, where the training set distribution differs from the test set. This phenomenon can be observed in the yearly records of several real-world datasets, such as recidivism records and medical expenditure surveys. If unaccounted for, such shifts can cause the predictions of a classifier to become unfair towards specific population subgroups. While the fairness notion called Proportional Equality (PE) accounts for such shifts, a procedure to ensure PE-fairness was unknown. In this work, we propose a method, called CAPE, which provides a comprehensive solution to the aforementioned problem. CAPE makes novel use of prevalence estimation techniques, sampling and an ensemble of classifiers to ensure fair predictions under prior probability shifts. We introduce a metric, called prevalence difference (PD), which CAPE attempts to minimize in order to ensure PE-fairness. We theoretically establish that this metric exhibits several desirable properties. We evaluate the efficacy of CAPE via a thorough empirical evaluation on synthetic datasets. We also compare the performance of CAPE with several popular fair classifiers on real-world datasets like COMPAS (criminal risk assessment) and MEPS (medical expenditure panel survey). The results indicate that CAPE ensures PE-fair predictions, while performing well on other performance metrics.
We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.
The remarkable scientific return and legacy of LSST, in the era that it will define, will not only be realized in the breakthrough science that will be achieved with catalog data. This Big Data survey will shape the way the entire astronomical community advances -- or fails to embrace -- new ways of approaching astronomical research and data. In this white paper, we address the NRC template questions 4,5,6,8 and 9, with a focus on the unique challenges for smaller, and often under-resourced, institutions, including institutions dedicated to underserved minority populations, in the efficient and effective use of LSST data products to maximize LSSTs scientific return.