No Arabic abstract
How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data, exploring the relationship between fair classification and intervention. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help learn policies that are both more accurate and fair, when presented with a historically biased dataset.
Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $mathbf{Y}$ and $mathbf{Z}$, GIN holds if and only if $omega^{intercal}mathbf{Y}$ and $mathbf{Z}$ are statistically independent, where $omega$ is a parameter vector characterized from the cross-covariance between $mathbf{Y}$ and $mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $mathbf{Y}$ d-separate $mathbf{Y}$ from $mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.
Unsupervised meta-learning approaches rely on synthetic meta-tasks that are created using techniques such as random selection, clustering and/or augmentation. Unfortunately, clustering and augmentation are domain-dependent, and thus they require either manual tweaking or expensive learning. In this work, we describe an approach that generates meta-tasks using generative models. A critical component is a novel approach of sampling from the latent space that generates objects grouped into synthetic classes forming the training and validation data of a meta-task. We find that the proposed approach, LAtent Space Interpolation Unsupervised Meta-learning (LASIUM), outperforms or is competitive with current unsupervised learning baselines on few-shot classification tasks on the most widely used benchmark datasets. In addition, the approach promises to be applicable without manual tweaking over a wider range of domains than previous approaches.
We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision. Many prior approaches to this problem have focused on learning disentangled representations so that as individual factors vary in a new domain, only a portion of the representation need be updated. In this work, we seek the generalization power of disentangled representations, but relax the requirement of explicit latent disentanglement and instead encourage linearity of individual factors of variation by requiring them to be manipulable by learned linear transformations. We dub these transformations latent canonicalizers, as they aim to modify the value of a factor to a pre-determined (but arbitrary) canonical value (e.g., recoloring the image foreground to black). Assuming a source domain with access to meta-labels specifying the factors of variation within an image, we demonstrate experimentally that our method helps reduce the number of observations needed to generalize to a similar target domain when compared to a number of supervised baselines.
Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced a number of interdisciplinary areas such as computational biology and drug design. Typically, kernels are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods from scaling up to millions of data points, and exploiting discriminative information to learn feature representations. We propose, structure2vec, an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information. Interestingly, structure2vec extracts features by performing a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are $10,000$ times smaller, while at the same time achieving the state-of-the-art predictive performance.
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that {ARL} improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.