Unified Adversarial Invariance


Abstract in English

We present a unified invariance framework for supervised neural networks that can induce independence to nuisance factors of data without using any nuisance annotations, but can additionally use labeled information about biasing factors to force their removal from the latent embedding for making fair predictions. Invariance to nuisance is achieved by learning a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, whereas that to biasing factors is brought about by penalizing the network if the latent embedding contains any information about them. We describe an adversarial instantiation of this framework and provide analysis of its working. Our model outperforms previous works at inducing invariance to nuisance factors without using any labeled information about such variables, and achieves state-of-the-art performance at learning independence to biasing factors in fairness settings.

Download