ﻻ يوجد ملخص باللغة العربية
Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the models evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.
Graph representation learning is a fundamental problem for modeling relational data and benefits a number of downstream applications. Traditional Bayesian-based graph models and recent deep learning based GNN either suffer from impracticability or la
Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the s
Deep learning models have demonstrated outstanding performance in several problems, but their training process tends to require immense amounts of computational and human resources for training and labeling, constraining the types of problems that ca
Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable representati
While neural networks for learning representation of multi-view data have been previously proposed as one of the state-of-the-art multi-view dimension reduction techniques, how to make the representation discriminative with only a small amount of lab