ﻻ يوجد ملخص باللغة العربية
Spoken language understanding (SLU), which is a core component of the task-oriented dialogue system, has made substantial progress in the research of single-turn dialogue. However, the performance in multi-turn dialogue is still not satisfactory in the sense that the existing multi-turn SLU methods have low portability and compatibility for other single-turn SLU models. Further, existing multi-turn SLU methods do not exploit the historical predicted results when predicting the current utterance, which wastes helpful information. To gap those shortcomings, in this paper, we propose a novel Result-based Portable Framework for SLU (RPFSLU). RPFSLU allows most existing single-turn SLU models to obtain the contextual information from multi-turn dialogues and takes full advantage of predicted results in the dialogue history during the current prediction. Experimental results on the public dataset KVRET have shown that all SLU models in baselines acquire enhancement by RPFSLU on multi-turn SLU tasks.
Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU often has to w
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we rel
Spoken language understanding (SLU) refers to the process of inferring the semantic information from audio signals. While the neural transformers consistently deliver the best performance among the state-of-the-art neural architectures in field of na
Spoken dialogue systems such as Siri and Alexa provide great convenience to peoples everyday life. However, current spoken language understanding (SLU) pipelines largely depend on automatic speech recognition (ASR) modules, which require a large amou
Visually-grounded models of spoken language understanding extract semantic information directly from speech, without relying on transcriptions. This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.