Learning Invariant Representations using Inverse Contrastive Loss

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aditya Kumar Akash

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aditya Kumar Akash - Vishnu Suresh Lokhande - Sathya N. Ravi

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimization purposes since these losses are agnostic of the metric structure of the parameters of the model. We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest by inverting the class of contrastive losses, i.e., inverse contrastive loss (ICL). We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence. More generally, we also show that if we are provided a metric on the sample space, our formulation of ICL can be decomposed into a sum of convex functions of the given distance metric. Our experimental results indicate that models obtained by optimizing ICL achieve significantly better invariance to the extraneous variable for a fixed desired level of accuracy. In a variety of experimental settings, we show applicability of ICL for learning invariant representations for both continuous and discrete extraneous variables.

قيم البحث

319 - Yuejiang Liu , Qi Yan , Alexandre Alahi 2020

Learning socially-aware motion representations is at the core of recent advances in multi-agent problems, such as human motion forecasting and robot navigation in crowds. Despite promising progress, existing representations learned with neural networ ks still struggle to generalize in closed-loop predictions (e.g., output colliding trajectories). This issue largely arises from the non-i.i.d. nature of sequential prediction in conjunction with ill-distributed training data. Intuitively, if the training data only comes from human behaviors in safe spaces, i.e., from positive examples, it is difficult for learning algorithms to capture the notion of negative examples like collisions. In this work, we aim to address this issue by explicitly modeling negative examples through self-supervision: (i) we introduce a social contrastive loss that regularizes the extracted motion representation by discerning the ground-truth positive events from synthetic negative ones; (ii) we construct informative negative samples based on our prior knowledge of rare but dangerous circumstances. Our method substantially reduces the collision rates of recent trajectory forecasting, behavioral cloning and reinforcement learning algorithms, outperforming state-of-the-art methods on several benchmarks. Our code is available at https://github.com/vita-epfl/social-nce.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Learning Video Representations using Contrastive Bidirectional Transformer

125 - Chen Sun , Fabien Baradel , Kevin Murphy 2019

This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method ext ends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive estimation (NCE). We also show how to learn representations from sequences of visual features and sequences of words derived from ASR (automatic speech recognition), and show that such cross-modal training (when possible) helps even more.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Understanding self-supervised Learning Dynamics without Contrastive Pairs

295 - Yuandong Tian , Xinlei Chen , Surya Ganguli 2021

While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), rece nt emph{non-contrastive} SSL (e.g., BYOL and SimSiam) show remarkable performance {it without} negative pairs, with an extra learnable predictor and a stop-gradient operation. A fundamental question arises: why do these methods not collapse into trivial representations? We answer this question via a simple theoretical study and propose a novel approach, DirectPred, that emph{directly} sets the linear predictor based on the statistics of its inputs, without gradient training. On ImageNet, it performs comparably with more complex two-layer non-linear predictors that employ BatchNorm and outperforms a linear predictor by $2.5%$ in 300-epoch training (and $5%$ in 60-epoch). DirectPred is motivated by our theoretical study of the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our study yields conceptual insights into how non-contrastive SSL methods learn, how they avoid representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all come into play. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. Code is released https://github.com/facebookresearch/luckmatters/tree/master/ssl.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Unsupervised Learning of Disentangled Representations from Video

114 - Emily Denton , Vighnesh Birodkar 2017

We present a new model DrNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part an d a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the time-vary components enables prediction of future frames. We evaluate our approach on a range of synthetic and real videos, demonstrating the ability to coherently generate hundreds of steps into the future.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Understanding and Achieving Efficient Robustness with Adversarial Supervised Contrastive Learning

109 - Anh Bui , Trung Le , He Zhao 2021

Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness using adversarial samples. We first designed and performed a comprehensive study to understand how adversarial vulnerability behaves in the latent space. Based on these empirical evidences, we propose an effective and efficient supervised contrastive learning to achieve model robustness against adversarial attacks. Moreover, we propose a new sample selection strategy that optimizes the positive/negative sets by removing redundancy and improving correlation with the anchor. Experiments conducted on benchmark datasets show that our Adversarial Supervised Contrastive Learning (ASCL) approach outperforms the state-of-the-art defenses by $2.6%$ in terms of the robust accuracy, whilst our ASCL with the proposed selection strategy can further gain $1.4%$ improvement with only $42.8%$ positives and $6.3%$ negatives compared with ASCL without a selection strategy.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط