Discovery and Separation of Features for Invariant Representation Learning

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ayush Jaiswal

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Ayush Jaiswal - Rob Brekelmans - Daniel Moyer

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Supervised machine learning models often associate irrelevant nuisance factors with the prediction target, which hurts generalization. We propose a framework for training robust neural networks that induces invariance to nuisances through learning to discover and separate predictive and nuisance factors of data. We present an information theoretic formulation of our approach, from which we derive training objectives and its connections with previous methods. Empirical results on a wide array of datasets show that the proposed framework achieves state-of-the-art performance, without requiring nuisance annotations during training.

قيم البحث

76 - Claudia Shi , Victor Veitch , David Blei 2020

The defining challenge for causal inference from observational data is the presence of `confounders, covariates that affect both treatment assignment and the outcome. To address this challenge, practitioners collect and adjust for the covariates, hop ing that they adequately correct for confounding. However, including every observed covariate in the adjustment runs the risk of including `bad controls, variables that induce bias when they are conditioned on. The problem is that we do not always know which variables in the covariate set are safe to adjust for and which are not. To address this problem, we develop Nearly Invariant Causal Estimation (NICE). NICE uses invariant risk minimization (IRM) [Arj19] to learn a representation of the covariates that, under some assumptions, strips out bad controls but preserves sufficient information to adjust for confounding. Adjusting for the learned representation, rather than the covariates themselves, avoids the induced bias and provides valid causal inferences. We evaluate NICE on both synthetic and semi-synthetic data. When the covariates contain unknown collider variables and other bad controls, NICE performs better than adjusting for all the covariates.

التعلم الآلي التعلم الالي

Neural Networks Regularization Through Class-wise Invariant Representation Learning

442 - Soufiane Belharbi , Clement Chatelain , Romain Herault 2017

Training deep neural networks is known to require a large number of training samples. However, in many applications only few training samples are available. In this work, we tackle the issue of training neural networks for classification task when fe w training samples are available. We attempt to solve this issue by proposing a new regularization term that constrains the hidden layers of a network to learn class-wise invariant representations. In our regularization framework, learning invariant representations is generalized to the class membership where samples with the same class should have the same representation. Numerical experiments over MNIST and its variants showed that our proposal helps improving the generalization of neural network particularly when trained with few samples. We provide the source code of our framework https://github.com/sbelharbi/learning-class-invariant-features .

التعلم الآلي التعلم الالي

Spectral Discovery of Jointly Smooth Features for Multimodal Data

62 - Felix Dietrich , Or Yair , Rotem Mulayoff 2020

In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. This allows us to register measurements of the same phenomenon by heterogeneous sensors, and to reject sensor-specific noise. O ur method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each separate manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered by increasing Dirichlet energy from the smoothest to the least smooth. In addition, we show that the extracted functions can be efficiently extended to unseen data using the Nystr{o}m method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear variants of the seminal Canonical Correlation Analysis (CCA). Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.

التعلم الآلي التعلم الالي

Learning transferable and discriminative features for unsupervised domain adaptation

111 - Yuntao Du , Ruiting Zhang , Xiaowen Zhang 2020

Although achieving remarkable progress, it is very difficult to induce a supervised classifier without any labeled data. Unsupervised domain adaptation is able to overcome this challenge by transferring knowledge from a labeled source domain to an un labeled target domain. Transferability and discriminability are two key criteria for characterizing the superiority of feature representations to enable successful domain adaptation. In this paper, a novel method called textit{learning TransFerable and Discriminative Features for unsupervised domain adaptation} (TFDF) is proposed to optimize these two objectives simultaneously. On the one hand, distribution alignment is performed to reduce domain discrepancy and learn more transferable representations. Instead of adopting textit{Maximum Mean Discrepancy} (MMD) which only captures the first-order statistical information to measure distribution discrepancy, we adopt a recently proposed statistic called textit{Maximum Mean and Covariance Discrepancy} (MMCD), which can not only capture the first-order statistical information but also capture the second-order statistical information in the reproducing kernel Hilbert space (RKHS). On the other hand, we propose to explore both local discriminative information via manifold regularization and global discriminative information via minimizing the proposed textit{class confusion} objective to learn more discriminative features, respectively. We integrate these two objectives into the textit{Structural Risk Minimization} (RSM) framework and learn a domain-invariant classifier. Comprehensive experiments are conducted on five real-world datasets and the results verify the effectiveness of the proposed method.

التعلم الآلي التعلم الالي

Representation Learning for Out-Of-Distribution Generalization in Reinforcement Learning

227 - Andrea Dittadi , Frederik Trauble , Manuel Wuthrich 2021

Learning data representations that are useful for various downstream tasks is a cornerstone of artificial intelligence. While existing methods are typically evaluated on downstream tasks such as classification or generative image quality, we propose to assess representations through their usefulness in downstream control tasks, such as reaching or pushing objects. By training over 10,000 reinforcement learning policies, we extensively evaluate to what extent different representation properties affect out-of-distribution (OOD) generalization. Finally, we demonstrate zero-shot transfer of these policies from simulation to the real world, without any domain randomization or fine-tuning. This paper aims to establish the first systematic characterization of the usefulness of learned representations for real-world OOD downstream tasks.

التعلم الآلي التعلم الالي