Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems

تحسين إعادة تأهب NLU باستخدام إشارات دقة الكيان في أنظمة حوار متعددة المجال

686 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In dialog systems, the Natural Language Understanding (NLU) component typically makes the interpretation decision (including domain, intent and slots) for an utterance before the mentioned entities are resolved. This may result in intent classification and slot tagging errors. In this work, we propose to leverage Entity Resolution (ER) features in NLU reranking and introduce a novel loss term based on ER signals to better learn model weights in the reranking framework. In addition, for a multi-domain dialog scenario, we propose a score distribution matching method to ensure scores generated by the NLU reranking models for different domains are properly calibrated. In offline experiments, we demonstrate our proposed approach significantly outperforms the baseline model on both single-domain and cross-domain evaluations.

References used

https://aclanthology.org/

rate research

Improving Named Entity Recognition in Spoken Dialog Systems by Context and Speech Pattern Modeling

732 - Association for Computation Linguistics 2021 مقالة

While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as S iri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.

استراتيجيات الاستشارة spoken dialog systems improving named entity نظم الحوار المنطوق صناعة حمض الفوسفور

Multilingual Paraphrase Generation For Bootstrapping New Features in Task-Oriented Dialog Systems

753 - Association for Computation Linguistics 2021 مقالة

The lack of labeled training data for new features is a common problem in rapidly changing real-world dialog systems. As a solution, we propose a multilingual paraphrase generation model that can be used to generate novel utterances for a target feat ure and target language. The generated utterances can be used to augment existing training data to improve intent classification and slot labeling models. We evaluate the quality of generated utterances using intrinsic evaluation metrics and by conducting downstream evaluation experiments with English as the source language and nine different target languages. Our method shows promise across languages, even in a zero-shot setting where no seed data is available.

task-oriented dialog systems dialog systems multilingual paraphrase generation أنظمة الحوار إعادة صياغة إعادة صياغة متعددة اللغات صناعة حمض الفوسفور

Revisiting Multi-Domain Machine Translation

763 - Association for Computation Linguistics 2021 مقالة

When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of re cent work that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain systems and to make them fully hold their promises.

multi-domain machine translation revisiting multi-domain machine ترجمة متعددة المجالات إعادة النظر آلة متعددة المجالات صناعة حمض الفوسفور

Multi-Sentence Knowledge Selection in Open-Domain Dialogue

1276 - Association for Computation Linguistics 2021 مقالة

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task, such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.

multi-sentence knowledge selection open-domain dialogue knowledge selection اختيار المعرفة متعددة الجملة الحوار مفتوح المجال اختيار المعرفة صناعة حمض الفوسفور المزيد..

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

1179 - Association for Computation Linguistics 2021 مقالة

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grain ed structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.

named entity disambiguation entity disambiguation غموض كيان اسمه غزول الكيان صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems

تحسين إعادة تأهب NLU باستخدام إشارات دقة الكيان في أنظمة حوار متعددة المجال

Ask ChatGPT about the research

Read More

suggested questions