Boosting a Model Zoo for Multi-Task and Continual Learning

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rahul Ramesh

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rahul Ramesh - Pratik Chaudhari

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Leveraging data from multiple tasks, either all at once, or incrementally, to learn one model is an idea that lies at the heart of multi-task and continual learning methods. Ideally, such a model predicts each task more accurately than if the task were trained in isolation. We show using tools in statistical learning theory (i) how tasks can compete for capacity, i.e., including a particular task can deteriorate the accuracy on a given task, and (ii) that the ideal set of tasks that one should train together in order to perform well on a given task is different for different tasks. We develop methods to discover such competition in typical benchmark datasets which suggests that the prevalent practice of training with all tasks leaves performance on the table. This motivates our Model Zoo, which is a boosting-based algorithm that builds an ensemble of models, each of which is very small, and it is trained on a smaller set of tasks. Model Zoo achieves large gains in prediction accuracy compared to state-of-the-art methods across a variety of existing benchmarks in multi-task and continual learning, as well as more challenging ones of our creation. We also show that even a model trained independently on all tasks outperforms all existing multi-task and continual learning methods.

قيم البحث

88 - David G. Nagy , GergH{o} Orban 2017

Both the human brain and artificial learning agents operating in real-world or comparably complex environments are faced with the challenge of online model selection. In principle this challenge can be overcome: hierarchical Bayesian inference provid es a principled method for model selection and it converges on the same posterior for both off-line (i.e. batch) and online learning. However, maintaining a parameter posterior for each model in parallel has in general an even higher memory cost than storing the entire data set and is consequently clearly unfeasible. Alternatively, maintaining only a limited set of models in memory could limit memory requirements. However, sufficient statistics for one model will usually be insufficient for fitting a different kind of model, meaning that the agent loses information with each model change. We propose that episodic memory can circumvent the challenge of limited memory-capacity online model selection by retaining a selected subset of data points. We design a method to compute the quantities necessary for model selection even when the data is discarded and only statistics of one (or few) learnt models are available. We demonstrate on a simple model that a limited-sized episodic memory buffer, when the content is optimised to retain data with statistics not matching the current representation, can resolve the fundamental challenge of online model selection.

التعلم الآلي التعلم الالي

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

122 - Nicholas C. Landolfi , Garrett Thomas , Tengyu Ma 2019

The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for env ironments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform sample-efficient adaptation to new tasks at test time. We use significantly fewer samples by performing policy optimization only in a virtual environment whose transitions are given by our learned dynamical model. Our algorithm sequentially trains against several tasks. Upon encountering a new task, we first warm-up a policy on our learned dynamical model, which requires no new samples from the environment. We then adapt the dynamical model with samples from this policy in the real environment. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these tasks.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Task-agnostic Continual Learning with Hybrid Probabilistic Models

121 - Polina Kirichenko , Mehrdad Farajtabar , Dushyant Rao 2021

Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative app roach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the models statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.

التعلم الآلي التعلم الالي

Efficient Continual Learning with Modular Networks and Task-Driven Priors

97 - Tom Veniat , Ludovic Denoyer , MarcAurelio Ranzato 2020

Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting, the inability of the learner to recall how to perform tasks observed in the past. There are however other desirable properties of a CL system, such as t he ability to transfer knowledge from previous tasks and to scale memory and compute sub-linearly with the number of tasks. Since most current benchmarks focus only on forgetting using short streams of tasks, we first propose a new suite of benchmarks to probe CL algorithms across these new axes. Finally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task. Learning a task reduces to figuring out which past modules to re-use, and which new modules to instantiate to solve the current task. Our learning algorithm leverages a task-driven prior over the exponential search space of all possible ways to combine modules, enabling efficient learning on long streams of tasks. Our experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work.

التعلم الآلي

Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction

105 - Mingcheng Chen , Zhenghui Wang , Zhiyun Zhao 2021

Diabetes prediction is an important data science application in the social healthcare domain. There exist two main challenges in the diabetes prediction task: data heterogeneity since demographic and metabolic data are of different types, data insuff iciency since the number of diabetes cases in a single medical center is usually limited. To tackle the above challenges, we employ gradient boosting decision trees (GBDT) to handle data heterogeneity and introduce multi-task learning (MTL) to solve data insufficiency. To this end, Task-wise Split Gradient Boosting Trees (TSGB) is proposed for the multi-center diabetes prediction task. Specifically, we firstly introduce task gain to evaluate each task separately during tree construction, with a theoretical analysis of GBDTs learning objective. Secondly, we reveal a problem when directly applying GBDT in MTL, i.e., the negative task gain problem. Finally, we propose a novel split method for GBDT in MTL based on the task gain statistics, named task-wise split, as an alternative to standard feature-wise split to overcome the mentioned negative task gain problem. Extensive experiments on a large-scale real-world diabetes dataset and a commonly used benchmark dataset demonstrate TSGB achieves superior performance against several state-of-the-art methods. Detailed case studies further support our analysis of negative task gain problems and provide insightful findings. The proposed TSGB method has been deployed as an online diabetes risk assessment software for early diagnosis.

التعلم الآلي