Tensorial Mixture Models

113 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Or Sharir

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Or Sharir - Ronen Tamari - Nadav Cohen

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Casting neural networks in generative frameworks is a highly sought-after endeavor these days. Contemporary methods, such as Generative Adversarial Networks, capture some of the generative capabilities, but not all. In particular, they lack the ability of tractable marginalization, and thus are not suitable for many tasks. Other methods, based on arithmetic circuits and sum-product networks, do allow tractable marginalization, but their performance is challenged by the need to learn the structure of a circuit. Building on the tractability of arithmetic circuits, we leverage concepts from tensor analysis, and derive a family of generative models we call Tensorial Mixture Models (TMMs). TMMs assume a simple convolutional network structure, and in addition, lend themselves to theoretical analyses that allow comprehensive understanding of the relation between their structure and their expressive properties. We thus obtain a generative model that is tractable on one hand, and on the other hand, allows effective representation of rich distributions in an easily controlled manner. These two capabilities are brought together in the task of classification under missing data, where TMMs deliver state of the art accuracies with seamless implementation and design.

قيم البحث

344 - Yingtao Tian , Jesse Engel 2019

End-to-end optimization has achieved state-of-the-art performance on many specific problems, but there is no straight-forward way to combine pretrained models for new problems. Here, we explore improving modularity by learning a post-hoc interface be tween two existing models to solve a new task. Specifically, we take inspiration from neural machine translation, and cast the challenging problem of cross-modal domain transfer as unsupervised translation between the latent spaces of pretrained deep generative models. By abstracting away the data representation, we demonstrate that it is possible to transfer across different modalities (e.g., image-to-audio) and even different types of generative models (e.g., VAE-to-GAN). We compare to state-of-the-art techniques and find that a straight-forward variational autoencoder is able to best bridge the two generative models through learning a shared latent space. We can further impose supervised alignment of attributes in both domains with a classifier in the shared latent space. Through qualitative and quantitative evaluations, we demonstrate that locality and semantic alignment are preserved through the transfer process, as indicated by high transfer accuracies and smooth interpolations within a class. Finally, we show this modular structure speeds up training of new interface models by several orders of magnitude by decoupling it from expensive retraining of base generative models.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Fast Neural Models for Symbolic Regression at Scale

76 - Allan Costa , Rumen Dangovski , Owen Dugan 2020

Deep learning owes much of its success to the astonishing expressiveness of neural networks. However, this comes at the cost of complex, black-boxed models that extrapolate poorly beyond the domain of the training dataset, conflicting with goals of f inding analytic expressions to describe science, engineering and real world data. Under the hypothesis that the hierarchical modularity of such laws can be captured by training a neural network, we introduce OccamNet, a neural network model that finds interpretable, compact, and sparse solutions for fitting data, `{a} la Occams razor. Our model defines a probability distribution over a non-differentiable function space. We introduce a two-step optimization method that samples functions and updates the weights with backpropagation based on cross-entropy matching in an evolutionary strategy: we train by biasing the probability mass toward better fitting solutions. OccamNet is able to fit a variety of symbolic laws including simple analytic functions, recursive programs, implicit functions, simple image classification, and can outperform noticeably state-of-the-art symbolic regression methods on real world regression datasets. Our method requires minimal memory footprint, does not require AI accelerators for efficient training, fits complicated functions in minutes of training on a single CPU, and demonstrates significant performance gains when scaled on a GPU. Our implementation, demonstrations and instructions for reproducing the experiments are available at https://github.com/druidowm/OccamNet_Public.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Self-training Converts Weak Learners to Strong Learners in Mixture Models

117 - Spencer Frei , Difan Zou , Zixiang Chen 2021

We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that ther e exists a universal constant $C_{mathrm{err}}>0$ such that if a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ can achieve classification error at most $C_{mathrm{err}}$, then for any $varepsilon>0$, an iterative self-training algorithm initialized at $boldsymbol{beta}_0 := boldsymbol{beta}_{mathrm{pl}}$ using pseudolabels $hat y = mathrm{sgn}(langle boldsymbol{beta}_t, mathbf{x}rangle)$ and using at most $tilde O(d/varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ with classification error $C_{mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $varepsilon$). Together our results imply that mixture models can be learned to within $varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $tilde O(d/varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.

التعلم الآلي التحسين والتحكم التعلم الالي

Stochastic Approximation for Online Tensorial Independent Component Analysis

83 - Chris Junchi Li , Michael I. Jordan 2020

Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing. In this paper, we present a convergence analysis for an online tensorial ICA algorithm, by viewing the problem as a nonconvex stochastic approximation problem. For estimating one component, we provide a dynamics-based analysis to prove that our online tensorial ICA algorithm with a specific choice of stepsize achieves a sharp finite-sample error bound. In particular, under a mild assumption on the data-generating distribution and a scaling condition such that $d^4/T$ is sufficiently small up to a polylogarithmic factor of data dimension $d$ and sample size $T$, a sharp finite-sample error bound of $tilde{O}(sqrt{d/T})$ can be obtained.

التعلم الآلي التحسين والتحكم التعلم الالي

DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference

102 - Jiyang Xie , Zhanyu Ma , Jing-Hao Xue 2020

This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the classifier of a DNN, i .e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods for DNNs, which only calculate the means or modes of the DNN outputs distributions, the proposed MoGMM-FC layer acts as a probabilistic interpreter for the features that are inputs of the classifier to directly calculate the probability density of them for the DS-UI. In addition, we propose a dual-supervised stochastic gradient-based variational Bayes (DS-SGVB) algorithm for the MoGMM-FC layer optimization. Unlike conventional SGVB and optimization algorithms in other UI methods, the DS-SGVB not only models the samples in the specific class for each Gaussian mixture model (GMM) in the MoGMM, but also considers the negative samples from other classes for the GMM to reduce the intra-class distances and enlarge the inter-class margins simultaneously for enhancing the learning ability of the MoGMM-FC layer in the DS-UI. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection. We further evaluate the DS-UI in open-set out-of-domain/-distribution detection and find statistically significant improvements. Visualizations of the feature spaces demonstrate the superiority of the DS-UI.

التعلم الآلي التعلم الالي