Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

تحجيم حجم دفعة التعلم المتعاقبة عميق تحت الذاكرة محدودة الإعداد

411 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

memory limited setup scaling deep contrastive limited setup الذاكرة محدودة الإعداد تحجيم مناقضات عميقة إعداد محدود صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples' positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example's loss on all batch examples and requires fitting the entire large batch into GPU memory. This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.

References used

https://aclanthology.org/

rate research

Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

1008 - Association for Computation Linguistics 2021 مقالة

Deep reinforcement learning has shown great potential in training dialogue policies. However, its favorable performance comes at the cost of many rounds of interaction. Most of the existing dialogue policy methods rely on a single learning system, wh ile the human brain has two specialized learning and memory systems, supporting to find good solutions without requiring copious examples. Inspired by the human brain, this paper proposes a novel complementary policy learning (CPL) framework, which exploits the complementary advantages of the episodic memory (EM) policy and the deep Q-network (DQN) policy to achieve fast and effective dialogue policy learning. In order to coordinate between the two policies, we proposed a confidence controller to control the complementary time according to their relative efficacy at different stages. Furthermore, memory connectivity and time pruning are proposed to guarantee the flexible and adaptive generalization of the EM policy in dialog tasks. Experimental results on three dialogue datasets show that our method significantly outperforms existing methods relying on a single learning system.

deep q-network policy deep q-network complementary policy learning سياسة شبكة Q-Network شبكة Q عميقة السياسة التكميلية التعلم صناعة حمض الفوسفور المزيد..

Memory-Based Semantic Parsing

510 - Association for Computation Linguistics 2021 مقالة

Abstract We present a memory-based model for context- dependent semantic parsing. Previous approaches focus on enabling the decoder to copy or modify the parse from the previous utterance, assuming there is a dependency between the current and previo us parses. In this work, we propose to represent contextual information using an external memory. We learn a context memory controller that manages the memory by maintaining the cumulative meaning of sequential user utterances. We evaluate our approach on three semantic parsing benchmarks. Experimental results show that our model can better process context-dependent information and demonstrates improved performance without using task-specific decoders.

semantic parsing memory-based semantic parsing dependent semantic parsing تحليل الدلالي تحليل الدلالات المستندة إلى الذاكرة تحليل الدلالي المعتمدة صناعة حمض الفوسفور المزيد..

Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text

603 - Association for Computation Linguistics 2021 مقالة

Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (C L) is one of the potential solutions to alleviate this problem. CL is a training strategy where training samples are given to models in a meaningful order instead of random sampling. In this work, we propose a new CL method which gradually increases the block-size of input text for training the self-attention mechanism of BERT and its variants using the maximum available batch-size. Experiments in low-resource settings show that our approach outperforms the baseline in terms of convergence speed and final performance on downstream tasks.

تسلسل العلامات increasing block-size learning by increasing زيادة حجم كتلة التعلم عن طريق زيادة صناعة حمض الفوسفور

Diversity-Aware Batch Active Learning for Dependency Parsing

550 - Association for Computation Linguistics 2021 مقالة

While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate whether enforcing diversity in the sampled batches, using determinantal point processes (DPPs), can improve over their diversity-agnostic counterparts. Simulation experiments on an English newswire corpus show that selecting diverse batches with DPPs is superior to strong selection strategies that do not enforce batch diversity, especially during the initial stages of the learning process. Additionally, our diversity-aware strategy is robust under a corpus duplication setting, where diversity-agnostic sampling strategies exhibit significant degradation.

batch active learning batch active دفعة التعلم النشط دفعة نشط صناعة حمض الفوسفور

Counter-Contrastive Learning for Language GANs

622 - Association for Computation Linguistics 2021 مقالة

Generative Adversarial Networks (GANs) have achieved great success in image synthesis, but have proven to be difficult to generate natural language. Challenges arise from the uninformative learning signals passed from the discriminator. In other word s, the poor learning signals limit the learning capacity for generating languages with rich structures and semantics. In this paper, we propose to adopt the counter-contrastive learning (CCL) method to support the generator's training in language GANs. In contrast to standard GANs that adopt a simple binary classifier to discriminate whether a sample is real or fake, we employ a counter-contrastive learning signal that advances the training of language synthesizers by (1) pulling the language representations of generated and real samples together and (2) pushing apart representations of real samples to compete with the discriminator and thus prevent the discriminator from being overtrained. We evaluate our method on both synthetic and real benchmarks and yield competitive performance compared to previous GANs for adversarial sequence generation.

generative adversarial networks counter-contrastive learning شبكات الخصومة التوليدية التعلم المضاد للتناقض صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

تحجيم حجم دفعة التعلم المتعاقبة عميق تحت الذاكرة محدودة الإعداد

Ask ChatGPT about the research

Read More

suggested questions