New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Grouping Words with Semantic Diversity

تجميع الكلمات مع التنوع الدلالي

390 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

semantic diversity deep learning-based nlp learning-based nlp systems التنوع الدلالي التعلم العميق القائم على NLP أنظمة NLP القائمة على التعلم صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Deep Learning-based NLP systems can be sensitive to unseen tokens and hard to learn with high-dimensional inputs, which critically hinder learning generalization. We introduce an approach by grouping input words based on their semantic diversity to simplify input language representation with low ambiguity. Since the semantically diverse words reside in different contexts, we are able to substitute words with their groups and still distinguish word meanings relying on their contexts. We design several algorithms that compute diverse groupings based on random sampling, geometric distances, and entropy maximization, and we prove formal guarantees for the entropy-based algorithms. Experimental results show that our methods generalize NLP models and demonstrate enhanced accuracy on POS tagging and LM tasks and significant improvements on medium-scale machine translation tasks, up to +6.5 BLEU points. Our source code is available at https://github.com/abdulrafae/dg.

References used

https://aclanthology.org/

rate research

Bag-of-Words Baselines for Semantic Code Search

228 - Association for Computation Linguistics 2021 مقالة

The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language. The semantic gap between natural language and programming languages has for long been regarded as one of the most significant obstacles to the effectiveness of keyword-based information retrieval (IR) methods. It is a common assumption that traditional'' bag-of-words IR methods are poorly suited for semantic code search: our work empirically investigates this assumption. Specifically, we examine the effectiveness of two traditional IR methods, namely BM25 and RM3, on the CodeSearchNet Corpus, which consists of natural language queries paired with relevant code snippets. We find that the two keyword-based methods outperform several pre-BERT neural models. We also compare several code-specific data pre-processing strategies and find that specialized tokenization improves effectiveness.

semantic code search code search semantic code كود الدلالي للبحث الرمز البحث الكود الدلالي صناعة حمض الفوسفور المزيد..

On the Difficulty of Segmenting Words with Attention

267 - Association for Computation Linguistics 2021 مقالة

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.

difficulty of segmenting segmenting words difficulty صعوبة تجزئة تجزئة الكلمات صعوبة صناعة حمض الفوسفور المزيد..

Controlling Dialogue Generation with Semantic Exemplars

410 - Association for Computation Linguistics 2021 مقالة

Dialogue systems pretrained with large language models generate locally coherent responses, but lack fine-grained control over responses necessary to achieve specific goals. A promising method to control response generation is exemplar-based generati on, in which models edit exemplar responses that are retrieved from training data, or hand-written to strategically address discourse-level goals, to fit new dialogue contexts. We present an Exemplar-based Dialogue Generation model, EDGE, that uses the semantic frames present in exemplar responses to guide response generation. We show that controlling dialogue generation based on the semantic frames of exemplars improves the coherence of generated responses, while preserving semantic meaning and conversation goals present in exemplar responses.

controlling dialogue generation dialogue generation exemplar-based dialogue generation السيطرة على جيل الحوار جيل الحوار توليد الحوار المستندة إلى Exemplar صناعة حمض الفوسفور المزيد..

Testing Cross-Database Semantic Parsers With Canonical Utterances

355 - Association for Computation Linguistics 2021 مقالة

The benchmark performance of cross-database semantic parsing has climbed steadily in recent years, catalyzed by the wide adoption of pre-trained language models. Yet existing work have shown that state-of-the-art cross-database semantic parsers strug gle to generalize to novel user utterances, databases and query structures. To obtain transparent details on the strengths and limitation of these models, we propose a diagnostic testing approach based on controlled synthesis of canonical natural language and SQL pairs. Inspired by the CheckList, we characterize a set of essential capabilities for cross-database semantic parsing models, and detailed the method for synthesizing the corresponding test data. We evaluated a variety of high performing models using the proposed approach, and identified several non-obvious weaknesses across models (e.g. unable to correctly select many columns). Our dataset and code are released as a test suite at http://github.com/hclent/BehaviorCheckingSemPar.

cross-database semantic parsers cross-database semantic المحللين الدلاليين عبر قاعدة البيانات المعتاد قاعدة البيانات الدلالية صناعة حمض الفوسفور

ntust-nlp-2 at ROCLING-2021 Shared Task: BERT-based semantic analyzer with word-level information

516 - Association for Computation Linguistics 2021 مقالة

In this paper, we proposed a BERT-based dimensional semantic analyzer, which is designed by incorporating with word-level information. Our model achieved three of the best results in four metrics on ROCLING 2021 Shared Task: Dimensional Sentiment Ana lysis for Educational Texts''. We conducted a series of experiments to compare the effectiveness of different pre-trained methods. Besides, the results also proofed that our method can significantly improve the performances than classic methods. Based on the experiments, we also discussed the impact of model architectures and datasets.

النصوص الأبعاد الجنس word-level information bert-based semantic analyzer معلومات مستوى الكلمات محلل الدلالي القائم على بيرت صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Grouping Words with Semantic Diversity

تجميع الكلمات مع التنوع الدلالي

Ask ChatGPT about the research

Read More

suggested questions