New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Domain Divergences: A Survey and Empirical Analysis

اختلافات المجال: مسح وتحليل تجريبي

263 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

measures higher-order measures divergence تدابير ذات طلب أعلى تشعب صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يلعب اختلاف المجال دورا مهما في تقدير أداء نموذج في مجالات جديدة. في حين أن هناك أدب كبيرا على تدابير الاختلاف، يجد الباحثون صعوبة في اختيار الاختلاف المناسب لتطبيق NLP معين. نحن نتطلع إلى هذا القصور من قبل كل من المسح الأدبيات ومن خلال دراسة تجريبية. نحن نطور تصنيفا من تدابير الاختلاف التي تتكون من ثلاث فصول --- إجراءات نظرية ونشرية هندسية وترتيب أعلى وتحديد العلاقات بينهما. علاوة على ذلك، لفهم حالات الاستخدام المشترك لهذه التدابير، نحن ندرك ثلاث تطبيقات جديدة - 1) اختيار البيانات، 2) تمثيل التعلم، و 3) القرارات في البرية - واستخدامها لتنظيم أدبنا. من هذا، نحدد أن التدابير النظريية للمعلومات منتشرة لمدة 1) و 3)، وتدابير ذات ترتيب أعلى أكثر شيوعا لمدة 2). لمزيد من المساعدة في مساعدة الباحثين في اختيار التدابير المناسبة للتنبؤ بالانخفاض في الأداء - وهو جانب مهم في القرارات في البرية، نقوم بإجراء تحليل العلاقة الممتدة 130 سيناريوهات تكيف المجال، و 3 مهام NLP متنوعة و 12 تدابير مختلفة تم تحديدها من مسحنا. لحساب هذه الاختلافات، نعتبر تمثيلات الكلمات السياقية الحالية (CWR) والتباين من التمثيلات الموزعة الأكبر سنا. نجد أن التدابير التقليدية على توزيعات الكلمات لا تزال تعمل كأساس قواعد قوية، في حين أن تدابير ذات طلب أعلى مع CWR فعالة.

Domain divergence plays a significant role in estimating the performance of a model in new domains. While there is a significant literature on divergence measures, researchers find it hard to choose an appropriate divergence for a given NLP application. We address this shortcoming by both surveying the literature and through an empirical study. We develop a taxonomy of divergence measures consisting of three classes --- Information-theoretic, Geometric, and Higher-order measures and identify the relationships between them. Further, to understand the common use-cases of these measures, we recognise three novel applications -- 1) Data Selection, 2) Learning Representation, and 3) Decisions in the Wild -- and use it to organise our literature. From this, we identify that Information-theoretic measures are prevalent for 1) and 3), and Higher-order measures are more common for 2). To further help researchers choose appropriate measures to predict drop in performance -- an important aspect of Decisions in the Wild, we perform correlation analysis spanning 130 domain adaptation scenarios, 3 varied NLP tasks and 12 divergence measures identified from our survey. To calculate these divergences, we consider the current contextual word representations (CWR) and contrast with the older distributed representations. We find that traditional measures over word distributions still serve as strong baselines, while higher-order measures with CWR are effective.

References used

https://aclanthology.org/

rate research

Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools

421 - Association for Computation Linguistics 2021 مقالة

Modern Natural Language Processing (NLP) makes intensive use of deep learning methods because of the accuracy they offer for a variety of applications. Due to the significant environmental impact of deep learning, cost-benefit analysis including carb on footprint as well as accuracy measures has been suggested to better document the use of NLP methods for research or deployment. In this paper, we review the tools that are available to measure energy use and CO2 emissions of NLP methods. We describe the scope of the measures provided and compare the use of six tools (carbon tracker, experiment impact tracker, green algorithms, ML CO2 impact, energy usage and cumulator) on named entity recognition experiments performed on different computational set-ups (local server vs. computing facility). Based on these findings, we propose actionable recommendations to accurately measure the environmental impact of NLP experiments.

محولات متعددة المتدرب مسبقا nlp methods modern natural language طرق NLP. اللغة الحديثة الطبيعية صناعة حمض الفوسفور

Representing Numbers in NLP: a Survey and a Vision

340 - Association for Computation Linguistics 2021 مقالة

NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a compre hensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by over a dozen previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.

prergistering nlp. nlp systems rarely الدراسة الاستقصائية نادرا لأنظمة NLP صناعة حمض الفوسفور

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

371 - Association for Computation Linguistics 2021 مقالة

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away th e notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

empirical analysis incremental nlu التحليل التجريبي nlu التزايدي صناعة حمض الفوسفور

Natural Language Processing Meets Quantum Physics: A Survey and Categorization

439 - Association for Computation Linguistics 2021 مقالة

Recent research has investigated quantum NLP, designing algorithms that process natural language in quantum computers, and also quantum-inspired algorithms that improve NLP performance on classical computers. In this survey, we review representative methods at the intersection of NLP and quantum physics in the past ten years, categorizing them according to the use of quantum theory, the linguistic targets that are modeled, and the downstream application. The literature review ends with a discussion on the key factors to the success that has been achieved by existing work, as well as challenges ahead, with the goal of better understanding the promises and further directions.

language processing meets processing meets quantum تجمع معالجة اللغة تجهيز التجهيز الكم. صناعة حمض الفوسفور

Paraphrase Generation: A Survey of the State of the Art

369 - Association for Computation Linguistics 2021 مقالة

This paper focuses on paraphrase generation,which is a widely studied natural language generation task in NLP. With the development of neural models, paraphrase generation research has exhibited a gradual shift to neural methods in the recent years. This has provided architectures for contextualized representation of an input text and generating fluent, diverseand human-like paraphrases. This paper surveys various approaches to paraphrase generation with a main focus on neural methods.

تحسين المفهوم حالة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Domain Divergences: A Survey and Empirical Analysis

اختلافات المجال: مسح وتحليل تجريبي

Ask ChatGPT about the research

Read More

suggested questions