Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public Speaking

57 0 0.0 ( 0 )

Download Cite

Added by Xingbo Wang

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Xingbo Wang - Haipeng Zeng - Yong Wang

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The modulation of voice properties, such as pitch, volume, and speed, is crucial for delivering a successful public speech. However, it is challenging to master different voice modulation skills. Though many guidelines are available, they are often not practical enough to be applied in different public speaking situations, especially for novice speakers. We present VoiceCoach, an interactive evidence-based approach to facilitate the effective training of voice modulation skills. Specifically, we have analyzed the voice modulation skills from 2623 high-quality speeches (i.e., TED Talks) and use them as the benchmark dataset. Given a voice input, VoiceCoach automatically recommends good voice modulation examples from the dataset based on the similarity of both sentence structures and voice modulation skills. Immediate and quantitative visual feedback is provided to guide further improvement. The expert interviews and the user study provide support for the effectiveness and usability of VoiceCoach.

rate research

Teddy: A System for Interactive Review Analysis

340 - Xiong Zhang , Jonathan Engel , Sara Evensen 2020

Reviews are integral to e-commerce services and products. They contain a wealth of information about the opinions and experiences of users, which can help better understand consumer decisions and improve user experience with products and services. Today, data scientists analyze reviews by developing rules and models to extract, aggregate, and understand information embedded in the review text. However, working with thousands of reviews, which are typically noisy incomplete text, can be daunting without proper tools. Here we first contribute results from an interview study that we conducted with fifteen data scientists who work with review text, providing insights into their practices and challenges. Results suggest data scientists need interactive systems for many review analysis tasks. In response we introduce Teddy, an interactive system that enables data scientists to quickly obtain insights from reviews and improve their extraction and modeling pipelines.

Human-Computer Interaction Computation and Language Machine Learning

An interactive dashboard for searching and comparing soccer performance scores

118 - Paolo Cintia , Giovanni Mauro , Luca Pappalardo 2021

The performance of soccer players is one of most discussed aspects by many actors in the soccer industry: from supporters to journalists, from coaches to talent scouts. Unfortunately, the dashboards available online provide no effective way to compare the evolution of the performance of players or to find players behaving similarly on the field. This paper describes the design of a web dashboard that interacts via APIs with a performance evaluation algorithm and provides graphical tools that allow the user to perform many tasks, such as to search or compare players by age, role or trend of growth in their performance, find similar players based on their pitching behavior, change the algorithms parameters to obtain customized performance scores. We also describe an example of how a talent scout can interact with the dashboard to find young, promising talents.

Human-Computer Interaction Artificial Intelligence Information Retrieval

Pathological voice adaptation with autoencoder-based voice conversion

156 - Marc Illa , Bence Mark Halpern , Rob van Son 2021

In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speakers voice characteristics. This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change. This change in the optimisation ensures that any degradation found in naturalness is due to the conversion process and not due to the model exaggerating characteristics of a speech pathology. To show a proof of concept of this method, we convert dysarthric speech using the UASpeech database and an autoencoder-based VC technique. Subjective evaluation results show reasonable naturalness for high intelligibility dysarthric speakers, though lower intelligibility seems to introduce a marginal degradation in naturalness scores for mid and low intelligibility speakers compared to ground truth. Conversion of speaker characteristics for low and high intelligibility speakers is successful, but not for mid. Whether the differences in the results for the different intelligibility levels is due to the intelligibility levels or due to the speakers needs to be further investigated.

Sound Computation and Language Audio and Speech Processing

LitStoryTeller: An Interactive System for Visual Exploration of Scientific Papers Leveraging Named entities and Comparative Sentences

268 - Qing Ping , Chaomei Chen 2017

The present study proposes LitStoryTeller, an interactive system for visually exploring the semantic structure of a scientific article. We demonstrate how LitStoryTeller could be used to answer some of the most fundamental research questions, such as how a new method was built on top of existing methods, based on what theoretical proof and experimental evidences. More importantly, LitStoryTeller can assist users to understand the full and interesting story a scientific paper, with a concise outline and important details. The proposed system borrows a metaphor from screen play, and visualizes the storyline of a scientific paper by arranging its characters (scientific concepts or terminologies) and scenes (paragraphs/sentences) into a progressive and interactive storyline. Such storylines help to preserve the semantic structure and logical thinking process of a scientific paper. Semantic structures, such as scientific concepts and comparative sentences, are extracted using existing named entity recognition APIs and supervised classifiers, from a scientific paper automatically. Two supplementary views, ranked entity frequency view and entity co-occurrence network view, are provided to help users identify the main plot of such scientific storylines. When collective documents are ready, LitStoryTeller also provides a temporal entity evolution view and entity community view for collection digestion.

Human-Computer Interaction Computation and Language Digital Libraries

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

281 - Anam Ahmad Khan , Joshua Newn , Ryan Kelly 2021

Annotation is an effective reading strategy people often undertake while interacting with digital text. It involves highlighting pieces of text and making notes about them. Annotating while reading in a desktop environment is considered trivial but, in a mobile setting where people read while hand-holding devices, the task of highlighting and typing notes on a mobile display is challenging. In this paper, we introduce GAVIN, a gaze-assisted voice note-taking application, which enables readers to seamlessly take voice notes on digital documents by implicitly anchoring them to text passages. We first conducted a contextual enquiry focusing on participants note-taking practices on digital documents. Using these findings, we propose a method which leverages eye-tracking and machine learning techniques to annotate voice notes with reference text passages. To evaluate our approach, we recruited 32 participants performing voice note-taking. Following, we trained a classifier on the data collected to predict text passage where participants made voice notes. Lastly, we employed the classifier to built GAVIN and conducted a user study to demonstrate the feasibility of the system. This research demonstrates the feasibility of using gaze as a resource for implicit anchoring of voice notes, enabling the design of systems that allow users to record voice notes with minimal effort and high accuracy.

Human-Computer Interaction

comments

Fetching comments

Al-Andalus University for Medical Sciences

Additional details More universities

VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public Speaking

Ask ChatGPT about the research

No Arabic abstract

Read More