ﻻ يوجد ملخص باللغة العربية
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy. The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task. In this work, we revisit this common practice via an empirical study on fine-grained image classification tasks and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds for different number of tasks jointly trained on different backbone architectures with different quantity of task-specific parameters. (2) A multi-task model with a small proportion of task-specific parameters from bottom layers can achieve competitive performance with independent models trained on each task separately and outperform a state-of-the-art MTL framework. Our observations suggest that people rethink the current sharing paradigm and adopt the new strategy of using separate bottom-layer parameters as a stronger baseline for model design in MTL.
Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considerin
An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks. If tasks depend on each other (e.g. needing to learn to walk before learning to run), curriculum learning can speed up learning by focusing on
Multi-task learning is a powerful method for solving multiple correlated tasks simultaneously. However, it is often impossible to find one single solution to optimize all the tasks, since different tasks might conflict with each other. Recently, a no
We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tun
Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinatio