New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Power of Scale for Parameter-Efficient Prompt Tuning

قوة الحجم لضبط موجه فعالة المعلمة

303 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

في هذا العمل، نستكشف ضبط موجه، "آلية بسيطة ولكنها فعالة لتعلم المطالبات الناعمة" لحالة نماذج اللغة المجمدة لتنفيذ مهام المصب المحددة. على عكس مطالبات النص المنفصلة المستخدمة من قبل GPT-3، يتم تعلم المطالبات الناعمة من خلال إعادة الاتصال ويمكن ضبطها لدمج الإشارات من أي عدد من الأمثلة المسمى. يتفوق نهجنا المستفاد من طرفي تنضم إلى التعلم القليل من GPT-3 لهامش كبير. بشكل ملحوظ، من خلال ablations على حجم النموذج باستخدام T5، نظهر أن الضبط الفوري يصبح أكثر تنافسية على نطاق الحجم: نظرا لأن النماذج تتجاوز مليارات المعلمات، فإن طريقتنا تغلق الفجوة "وتطابق الأداء القوي لضبط النموذج (حيث جميع الأوزان النموذجية ضبطها). هذه النتيجة ذات صلة خاصة لأن النماذج الكبيرة مكلفة للمشاركة والخدمة والقدرة على إعادة استخدام نموذج واحد مجمد لمهام متعددة المصب يمكن أن تخفف من هذا العبء. يمكن اعتبار طريقةنا بمثابة تبسيط لضبط البادئة المقترح مؤخرا "لى ولديانغ (2021) ونوفر مقارنة بهذه الطريقة وغيرها من الأساليب المماثلة. أخيرا، نظهر أن تكييف نموذج مجمد مع مطالبات ناعمة يمنح الفوائد في متانة نقل المجال وتمكين الكفاءة الفعالة من الفئة الفعالة. "نحن ندرك رمز نقاط التفتيش والنموذج لإعادة إنتاج تجاربنا.

In this work, we explore prompt tuning,'' a simple yet effective mechanism for learning soft prompts'' to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method closes the gap'' and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant because large models are costly to share and serve and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed prefix tuning'' of Li and Liang (2021) and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer and enables efficient prompt ensembling.'' We release code and model checkpoints to reproduce our experiments.

References used

https://aclanthology.org/

rate research

Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages

548 - Association for Computation Linguistics 2021 مقالة

Abstract Most combinations of NLP tasks and language varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient generalizations from task--language combinations with available data to low-resource ones? In this work, we propose a Bayesian generative model for the space of neural parameters. We assume that this space can be factorized into latent variables for each language and each task. We infer the posteriors over such latent variables based on data from seen task--language combinations through variational inference. This enables zero-shot classification on unseen combinations at prediction time. For instance, given training data for named entity recognition (NER) in Vietnamese and for part-of-speech (POS) tagging in Wolof, our model can perform accurate predictions for NER in Wolof. In particular, we experiment with a typologically diverse sample of 33 languages from 4 continents and 11 families, and show that our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods. Our code is available at github.com/cambridgeltl/parameter-factorization.

parameter space factorization space factorization zero-shot learning المعلمة تعامل الفضاء عامل الفضاء التعلم صفر النار صناعة حمض الفوسفور المزيد..

Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings

423 - Association for Computation Linguistics 2021 مقالة

Recent studies have proposed different methods to improve multilingual word representations in contextualized settings including techniques that align between source and target embedding spaces. For contextualized embeddings, alignment becomes more c omplex as we additionally take context into consideration. In this work, we propose using Optimal Transport (OT) as an alignment objective during fine-tuning to further improve multilingual contextualized representations for downstream cross-lingual transfer. This approach does not require word-alignment pairs prior to fine-tuning that may lead to sub-optimal matching and instead learns the word alignments within context in an unsupervised manner. It also allows different types of mappings due to soft matching between source and target sentences. We benchmark our proposed method on two tasks (XNLI and XQuAD) and achieve improvements over baselines as well as competitive results compared to similar recent works.

optimal transport multilingual contextualized embeddings multilingual contextualized النقل الأمثل تضمينات محتوى متعددة اللغات المحاكيات متعددة اللغات صناعة حمض الفوسفور المزيد..

The Degree of Participation of Cooperating in the Parameter Field Training for Kindergarten Students in the College of Education at Albaath University

1368 - Aِl-Baath University 2015 ورقة بحثية

the aim of this research is to know the degree of participation of the parameter in the three stages of field training "boot, viewing, sharing, from the viewpoint of the students majoring parameters kindergarten in al-Baath University College of education.

المعلمة المتعاونة Field training التدريب الميداني الطالبات المعلمات تخصص رياض الأطفال Parameter Kindergarten Student in the College of Education

MM-AVS: A Full-Scale Dataset for Multi-modal Summarization

314 - Association for Computation Linguistics 2021 مقالة

Multimodal summarization becomes increasingly significant as it is the basis for question answering, Web search, and many other downstream tasks. However, its learning materials have been lacking a holistic organization by integrating resources from various modalities, thereby lagging behind the research progress of this field. In this study, we release a full-scale multimodal dataset comprehensively gathering documents, summaries, images, captions, videos, audios, transcripts, and titles in English from CNN and Daily Mail. To our best knowledge, this is the first collection that spans all modalities and nearly comprises all types of materials available in this community. In addition, we devise a baseline model based on the novel dataset, which employs a newly proposed Jump-Attention mechanism based on transcripts. The experimental results validate the important assistance role of the external information for multimodal summarization.

multi-modal summarization multi-modal web search تلخيص متعددة الوسائط متعددة مشروط البحث في الويب صناعة حمض الفوسفور المزيد..

An efficient algorithm for finding a shortest paths for all vertices in graph

4159 - Aِl-Baath University 2015 ورقة بحثية

The all-nodes shortest paths problem is undoubtedly one of the most basic problems in algorithmic graph theory. In this paper, we introduce simple and efficient algorithm for all nodes shortest paths problem for directed (undirected) graphs. In th is problem, we find the shortest path from a given source node to all other nodes in the graph, in which the shortest path is a path with minimum cost, i.e., sum of the edge weights. We proved that the complexity of the proposed algorithm in this paper depends only on the edges graph, and we show that the time of implementation of this algorithm is linear time O(m) and This is considered the best times of the algorithms at all. And a Comparison between complexity of proposed algorithm and the famous shortest path algorithms have been made, and the obtained results have shown that the complexity of the proposed algorithm is best.

البيان المسار الأقصر العقد الأضلاع Graph Shortest path Nodes Edges المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Power of Scale for Parameter-Efficient Prompt Tuning

قوة الحجم لضبط موجه فعالة المعلمة

Ask ChatGPT about the research

Read More

suggested questions