Structured Disentangled Representations

385 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Babak Esmaeili

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Babak Esmaeili - Hao Wu - Sarthak Jain

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.

قيم البحث

79 - Leonhard Helminger , Abdelaziz Djelouah , Markus Gross 2018

We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The res ult of our factorized graphical model is a well-organized and coherent latent space for data dynamics. We demonstrate our method on several synthetic dynamic datasets and real video data featuring various facial expressions and head poses.

التعلم الالي التعلم الآلي

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

62 - Ze Cheng , Juncheng Li , Chenxu Wang 2020

In the problem of learning disentangled representations, one of the promising methods is to factorize aggregated posterior by penalizing the total correlation of sampled latent variables. However, this well-motivated strategy has a blind spot: there is a disparity between the sampled latent representation and its corresponding mean representation. In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation. Indeed, we prove that for the multivariate normal distributions, the mean representation with arbitrarily high total correlation can have a corresponding sampled representation with bounded total correlation. We also propose a method to eliminate this disparity. Experiments show that our model can learn a mean representation with much lower total correlation, hence a factorized mean representation. Moreover, we offer a detailed explanation of the limitations of factorizing aggregated posterior: factor disintegration. Our work indicates a potential direction for future research of disentangled learning.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Domain Agnostic Learning with Disentangled Representations

147 - Xingchao Peng , Zijun Huang , Ximeng Sun 2019

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains. Yet the current literature assumes that the separation of target data into distinct domains is known as a priori. In this paper, we propose the task of Domain-Agnostic Learning (DAL): How to transfer knowledge from a labeled source domain to unlabeled data from arbitrary target domains? To tackle this problem, we devise a novel Deep Adversarial Disentangled Autoencoder (DADA) capable of disentangling domain-specific features from class identity. We demonstrate experimentally that when the target domain labels are unknown, DADA leads to state-of-the-art performance on several image classification datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Disentangled Expression Representations from Facial Images

106 - Marah Halawa , Manuel Wollhaf , Eduardo Vellasques 2020

Face images are subject to many different factors of variation, especially in unconstrained in-the-wild scenarios. For most tasks involving such images, e.g. expression recognition from video streams, having enough labeled data is prohibitively expen sive. One common strategy to tackle such a problem is to learn disentangled representations for the different factors of variation of the observed data using adversarial learning. In this paper, we use a formulation of the adversarial loss to learn disentangled representations for face images. The used model facilitates learning on single-task datasets and improves the state-of-the-art in expression recognition with an accuracy of60.53%on the AffectNetdataset, without using any additional data.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Structured Prediction Cascades

424 - David Weiss , Benjamin Sapp , Ben Taskar 2012

Structured prediction tasks pose a fundamental trade-off between the need for model complexity to increase predictive power and the limited computational resources for inference in the exponentially-sized output spaces such models require. We formula te and develop the Structured Prediction Cascade architecture: a sequence of increasingly complex models that progressively filter the space of possible outputs. The key principle of our approach is that each model in the cascade is optimized to accurately filter and refine the structured output state space of the next model, speeding up both learning and inference in the next layer of the cascade. We learn cascades by optimizing a novel convex loss function that controls the trade-off between the filtering efficiency and the accuracy of the cascade, and provide generalization bounds for both accuracy and efficiency. We also extend our approach to intractable models using tree-decomposition ensembles, and provide algorithms and theory for this setting. We evaluate our approach on several large-scale problems, achieving state-of-the-art performance in handwriting recognition and human pose recognition. We find that structured prediction cascades allow tremendous speedups and the use of previously intractable features and models in both settings.

التعلم الالي التعلم الآلي