Do you want to publish a course? Click here

TSO: Curriculum Generation using continuous optimization

71   0   0.0 ( 0 )
 Added by Dipankar Sarkar
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

The training of deep learning models poses vast challenges of including parameter tuning and ordering of training data. Significant research has been done in Curriculum learning for optimizing the sequence of training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strategy to maximize learning for a given network. In this paper, we present a simple and efficient technique based on continuous optimization. We call this new approach Training Sequence Optimization (TSO). There are three critical components in our proposed approach: (a) An encoder network maps/embeds training sequence into continuous space. (b) A predictor network uses the continuous representation of a strategy as input and predicts the accuracy for fixed network architecture. (c) A decoder further maps a continuous representation of a strategy to the ordered training dataset. The performance predictor and encoder enable us to perform gradient-based optimization in the continuous space to find the embedding of optimal training data ordering with potentially better accuracy. Experiments show that we can gain 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 dataset and have better boosts than the state of the art CL algorithms. We do an ablation study varying the architecture, dataset and sample sizes showcasing our approachs robustness.

rate research

Read More

In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.
For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards. To solve this problem, reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn. More specifically, RCG adapts the initial state distribution from the neighborhood of a goal to a distance as training proceeds. However, the initial state distribution generated for each iteration might be biased, thus making the policy overfit or slowing down the reverse expansion rate. While training RCG for actor-critic (AC) based RL algorithms, this poor generalization and slow convergence might be induced by the tight coupling between an AC pair. Therefore, we propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics. We empirically demonstrate that this proposed approach can improve RCG in performance and convergence, and it can also be applied to other AC based RL algorithms with adapted initial state distribution.
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while guaranteeing near-constraint satisfaction for every policy update by projecting either the policy parameter or the action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world indoor robot navigation problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction. Videos of the experiments can be found in the following link: https://drive.google.com/file/d/1pzuzFqWIE710bE2U6DmS59AfRzqK2Kek/view?usp=sharing.
Recently, variational auto-encoder (VAE) based approaches have made impressive progress on improving the diversity of generated responses. However, these methods usually suffer the cost of decreased relevance accompanied by diversity improvements. In this paper, we propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization to enhance relevance and diversity of generated responses. First, unlike existing VAE-based models that usually approximate a simple Gaussian posterior distribution, we present a Gaussian mixture posterior distribution (i.e, multimodal) to further boost response diversity, which helps capture complex semantics of responses. Then, to ensure that relevance does not decrease while diversity increases, we fully exploit similar examples (exemplars) retrieved from the training data into posterior distribution modeling to augment response relevance. Furthermore, to facilitate the convergence of Gaussian mixture prior and posterior distributions, we devise a curriculum optimization strategy to progressively train the model under multiple training criteria from easy to hard. Experimental results on widely used SwitchBoard and DailyDialog datasets demonstrate that our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
293 - Lei Shen , Yang Feng 2020
Emotion-controllable response generation is an attractive and valuable task that aims to make open-domain conversations more empathetic and engaging. Existing methods mainly enhance the emotion expression by adding regularization terms to standard cross-entropy loss and thus influence the training process. However, due to the lack of further consideration of content consistency, the common problem of response generation tasks, safe response, is intensified. Besides, query emotions that can help model the relationship between query and response are simply ignored in previous models, which would further hurt the coherence. To alleviate these problems, we propose a novel framework named Curriculum Dual Learning (CDL) which extends the emotion-controllable response generation to a dual task to generate emotional responses and emotional queries alternatively. CDL utilizes two rewards focusing on emotion and content to improve the duality. Additionally, it applies curriculum learning to gradually generate high-quality responses based on the difficulties of expressing various emotions. Experimental results show that CDL significantly outperforms the baselines in terms of coherence, diversity, and relation to emotion factors.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا