Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Exploring Recurrent, Memory and Attention Based Architectures for Scoring Interactional Aspects of Human-Machine Text Dialog

76 0 0.0 ( 0 )

Download Cite

Added by Vikram Ramanarayanan

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Vikram Ramanarayanan - Matthew Mulholland - Debanjan Ghosh

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

An important step towards enabling English language learners to improve their conversational speaking proficiency involves automated scoring of multiple aspects of interactional competence and subsequent targeted feedback. This paper builds on previous work in this direction to investigate multiple neural architectures -- recurrent, attention and memory based -- along with feature-engineered models for the automated scoring of interactional and topic development aspects of text dialog data. We conducted experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system, which were triple-scored along multiple dimensions of conversational proficiency. We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements, with (i) hand-engineered features passed to a support vector learner and (ii) transformer-based architectures contributing most prominently to the fusion.

rate research

Attention-Based Models for Text-Dependent Speaker Verification

84 - F A Rezaur Rahman Chowdhury , Quan Wang , Ignacio Lopez Moreno 2017

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model.

Audio and Speech Processing Machine Learning Sound

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

97 - Yota Ueda , Kazuki Fujii , Yuki Saito 2021

We propose a conditional generative adversarial network (GAN) incorporating humans perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was proposed to model the human-acceptable distribution. A DNN-based generator is trained using a human-based discriminator, i.e., humans perceptual evaluations, instead of the GANs DNN-based discriminator. However, the HumanGAN cannot represent conditional distributions. This paper proposes the HumanACGAN, a theoretical extension of the HumanGAN, to deal with conditional human-acceptable distributions. Our HumanACGAN trains a DNN-based conditional generator by regarding humans as not only a discriminator but also an auxiliary classifier. The generator is trained by deceiving the human-based discriminator that scores the unconditioned naturalness and the human-based classifier that scores the class-conditioned perceptual acceptability. The training can be executed using the backpropagation algorithm involving humans perceptual evaluations. Our experimental results in phoneme perception demonstrate that our HumanACGAN can successfully train this conditional generator.

Human-Computer Interaction Machine Learning Sound

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension

306 - Hongyu Gong , Yelong Shen , Dian Yu 2020

In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer. State-of-the-art models tend to use a pretrained transformer model (e.g., BERT) to encode the joint contextual information of document and question. However, these transformer-based models can only take a fixed-length (e.g., 512) text as its input. To deal with even longer text inputs, previous approaches usually chunk them into equally-spaced segments and predict answers based on each segment independently without considering the information from other segments. As a result, they may form segments that fail to cover the correct answer span or retain insufficient contexts around it, which significantly degrades the performance. Moreover, they are less capable of answering questions that need cross-segment information. We propose to let a model learn to chunk in a more flexible way via reinforcement learning: a model can decide the next segment that it wants to process in either direction. We also employ recurrent mechanisms to enable information to flow across segments. Experiments on three MRC datasets -- CoQA, QuAC, and TriviaQA -- demonstrate the effectiveness of our proposed recurrent chunking mechanisms: we can obtain segments that are more likely to contain complete answers and at the same time provide sufficient contexts around the ground truth answers for better predictions.

Computation and Language Machine Learning

A Machine Learning Approach for Predicting Human Preference for Graph Layouts

106 - Shijun Cai , Seok-Hee Hong , Jialiang Shen 2021

Understanding what graph layout human prefer and why they prefer is significant and challenging due to the highly complex visual perception and cognition system in human brain. In this paper, we present the first machine learning approach for predicting human preference for graph layouts. In general, the data sets with human preference labels are limited and insufficient for training deep networks. To address this, we train our deep learning model by employing the transfer learning method, e.g., exploiting the quality metrics, such as shape-based metrics, edge crossing and stress, which are shown to be correlated to human preference on graph layouts. Experimental results using the ground truth human preference data sets show that our model can successfully predict human preference for graph layouts. To our best knowledge, this is the first approach for predicting qualitative evaluation of graph layouts using human preference experiment data.

Human-Computer Interaction Machine Learning

IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval

152 - Hui Chen , Guiguang Ding , Xudong Liu 2020

Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision and language. Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner. However, most of them consider all semantics equally and thus align them uniformly, regardless of their diverse complexities. In fact, semantics are diverse (i.e. involving different kinds of semantic concepts), and humans usually follow a latent structure to combine them into understandable languages. It may be difficult to optimally capture such sophisticated correspondences in existing methods. In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple steps of alignments. Specifically, we introduce an iterative matching scheme to explore such fine-grained correspondence progressively. A memory distillation unit is used to refine alignment knowledge from early steps to later ones. Experiment results on three benchmark datasets, i.e. Flickr8K, Flickr30K, and MS COCO, show that our IMRAM achieves state-of-the-art performance, well demonstrating its effectiveness. Experiments on a practical business advertisement dataset, named Ads{}, further validates the applicability of our method in practical scenarios.

Computer Vision and Pattern Recognition

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Exploring Recurrent, Memory and Attention Based Architectures for Scoring Interactional Aspects of Human-Machine Text Dialog

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions