ترغب بنشر مسار تعليمي؟ اضغط هنا

Multi-task additive models with shared transfer functions based on dictionary learning

224   0   0.0 ( 0 )
 نشر من قبل Alhussein Fawzi
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

Additive models form a widely popular class of regression models which represent the relation between covariates and response variables as the sum of low-dimensional transfer functions. Besides flexibility and accuracy, a key benefit of these models is their interpretability: the transfer functions provide visual means for inspecting the models and identifying domain-specific relations between inputs and outputs. However, in large-scale problems involving the prediction of many related tasks, learning independently additive models results in a loss of model interpretability, and can cause overfitting when training data is scarce. We introduce a novel multi-task learning approach which provides a corpus of accurate and interpretable additive models for a large number of related forecasting tasks. Our key idea is to share transfer functions across models in order to reduce the model complexity and ease the exploration of the corpus. We establish a connection with sparse dictionary learning and propose a new efficient fitting algorithm which alternates between sparse coding and transfer function updates. The former step is solved via an extension of Orthogonal Matching Pursuit, whose properties are analyzed using a novel recovery condition which extends existing results in the literature. The latter step is addressed using a traditional dictionary update rule. Experiments on real-world data demonstrate that our approach compares favorably to baseline methods while yielding an interpretable corpus of models, revealing structure among the individual tasks and being more robust when training data is scarce. Our framework therefore extends the well-known benefits of additive models to common regression settings possibly involving thousands of tasks.



قيم البحث

اقرأ أيضاً

Machine Learning algorithms are increasingly being used in recent years due to their flexibility in model fitting and increased predictive performance. However, the complexity of the models makes them hard for the data analyst to interpret the result s and explain them without additional tools. This has led to much research in developing various approaches to understand the model behavior. In this paper, we present the Explainable Neural Network (xNN), a structured neural network designed especially to learn interpretable features. Unlike fully connected neural networks, the features engineered by the xNN can be extracted from the network in a relatively straightforward manner and the results displayed. With appropriate regularization, the xNN provides a parsimonious explanation of the relationship between the features and the output. We illustrate this interpretable feature--engineering property on simulated examples.
69 - Shin Matsushima 2018
A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric model by the sum of univariate functions with respect to each explanatory variable, i.e., $f({mathbf x}) = sum f_j(x_j)$, where $x_jinmathbb{R}$ is $j$-th component o f a sample ${mathbf x}in mathbb{R}^p$. In this paper, we introduce the total variation (TV) of a function as a measure of the complexity of functions in $L^1_{rm c}(mathbb{R})$-space. Our analysis shows that a GAM based on TV-regularization exhibits a Rademacher complexity of $O(sqrt{frac{log p}{m}})$, which is tight in terms of both $m$ and $p$ in the agnostic case of the classification problem. In result, we obtain generalization error bounds for finite samples according to work by Bartlett and Mandelson (2002).
Style transfer is a significant problem of machine learning with numerous successful applications. In this work, we present a novel style transfer framework building upon infinite task learning and vector-valued reproducing kernel Hilbert spaces. We instantiate the idea in emotion transfer where the goal is to transform facial images to different target emotions. The proposed approach provides a principled way to gain explicit control over the continuous style space. We demonstrate the efficiency of the technique on popular facial emotion benchmarks, achieving low reconstruction cost and high emotion classification accuracy.
This paper introduces a new nonlinear dictionary learning method for histograms in the probability simplex. The method leverages optimal transport theory, in the sense that our aim is to reconstruct histograms using so-called displacement interpolati ons (a.k.a. Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex. Our method simultaneously estimates such atoms, and, for each datapoint, the vector of weights that can optimally reconstruct it as an optimal transport barycenter of such atoms. Our method is computationally tractable thanks to the addition of an entropic regularization to the usual optimal transportation problem, leading to an approximation scheme that is efficient, parallel and simple to differentiate. Both atoms and weights are learned using a gradient-based descent method. Gradients are obtained by automatic differentiation of the generalized Sinkhorn iterations that yield barycenters with entropic smoothing. Because of its formulation relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, our method allows for nonlinear relationships between atoms and the reconstruction of input data. We illustrate its application in several different image processing settings.
Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular netwo rks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectural choice allows adding, removing and changing the order of residual blocks. In our method, the modules can be invoked repeatedly and allow knowledge transfer to novel tasks by adjusting the order of computation. This allows soft weight sharing between tasks with only a small increase in the number of parameters. We show that our method leads to interpretable self-organization of modules in case of multi-task learning, transfer learning and domain adaptation while achieving competitive results on those tasks. From practical perspective, our approach allows to: (a) reuse existing modules for learning new task by adjusting the computation order, (b) use it for unsupervised multi-source domain adaptation to illustrate that adaptation to unseen data can be achieved by only manipulating the order of pretrained modules, (c) show how our approach can be used to increase accuracy of existing architectures for image classification tasks such as ImageNet, without any parameter increase, by reusing the same block multiple times.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا