ترغب بنشر مسار تعليمي؟ اضغط هنا

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

66   0   0.0 ( 0 )
 نشر من قبل Libo Qin
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.

قيم البحث

اقرأ أيضاً

Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision , natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally divide existing dialogue systems into task-oriented and non-task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we rel ease SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU and ASR systems; (3) A new transparent metric for entity labelling which enables a detailed error analysis for identifying potential areas of improvement. SLURP is available at https: //github.com/pswietojanski/slurp.
In the NLP community, recent years have seen a surge of research activities that address machines ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the wor ld. Many benchmark tasks and datasets have been created to support the development and evaluation of such natural language inference ability. As these benchmarks become instrumental and a driving force for the NLP research community, this paper aims to provide an overview of recent benchmarks, relevant knowledge resources, and state-of-the-art learning and inference approaches in order to support a better understanding of this growing field.
Word Error Rate (WER) has been the predominant metric used to evaluate the performance of automatic speech recognition (ASR) systems. However, WER is sometimes not a good indicator for downstream Natural Language Understanding (NLU) tasks, such as in tent recognition, slot filling, and semantic parsing in task-oriented dialog systems. This is because WER takes into consideration only literal correctness instead of semantic correctness, the latter of which is typically more important for these downstream tasks. In this study, we propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems to address this issue. We define SemDist as the distance between a reference and hypothesis pair in a sentence-level embedding space. To represent the reference and hypothesis as a sentence embedding, we exploit RoBERTa, a state-of-the-art pre-trained deep contextualized language model based on the transformer architecture. We demonstrate the effectiveness of our proposed metric on various downstream tasks, including intent recognition, semantic parsing, and named entity recognition.
Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU often has to w ait for the ASR to finish processing on an utterance basis, potentially leading to high latencies that render the spoken interaction less natural. In this paper, we propose recurrent neural network (RNN) based incremental processing towards the SLU task of intent detection. The proposed methodology offers lower latencies than a typical SLU system, without any significant reduction in system accuracy. We introduce and analyze different recurrent neural network architectures for incremental and online processing of the ASR transcripts and compare it to the existing offline systems. A lexical End-of-Sentence (EOS) detector is proposed for segmenting the stream of transcript into sentences for intent classification. Intent detection experiments are conducted on benchmark ATIS, Snips and Facebooks multilingual task oriented dialog datasets modified to emulate a continuous incremental stream of words with no utterance demarcation. We also analyze the prospects of early intent detection, before EOS, with our proposed system.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا