Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Look whos not talking

242 0 0.0 ( 0 )

Download Cite

Added by Joon Son Chung

Publication date 2020

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Youngki Kwon - Hee Soo Heo - Jaesung Huh

Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The objective of this work is speaker diarisation of speech recordings in the wild. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.

rate research

Look Whos Talking: Active Speaker Detection in the Wild

75 - You Jin Kim , Hee-Soo Heo , Soyeon Choe 2021

In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face is visible and the voice is audible simultaneously. Although active speaker detection is a crucial pre-processing step for many audio-visual tasks, there is no existing dataset of natural human speech to evaluate the performance of active speaker detection. We therefore curate the Active Speakers in the Wild (ASW) dataset which contains videos and co-occurring speech segments with dense speech activity labels. Videos and timestamps of audible segments are parsed and adopted from VoxConverse, an existing speaker diarisation dataset that consists of videos in the wild. Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way. Two reference systems, a self-supervised system and a fully supervised one, are evaluated on the dataset to provide the baseline performances of ASW. Cross-domain evaluation is conducted in order to show the negative effect of dubbed videos in the training data.

Computer Vision and Pattern Recognition Sound Audio and Speech Processing

Look Whos Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

85 - Charles Welch , Veronica Perez-Rosas , Jonathan K. Kummerfeld 2019

We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.

Computation and Language Artificial Intelligence

Look Whos Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default

67 - Lisa Crosato , Caterina Liberati , Marco Repetto 2021

Academic research and the financial industry have recently paid great attention to Machine Learning algorithms due to their power to solve complex learning tasks. In the field of firms default prediction, however, the lack of interpretability has prevented the extensive adoption of the black-box type of models. To overcome this drawback and maintain the high performances of black-boxes, this paper relies on a model-agnostic approach. Accumulated Local Effects and Shapley values are used to shape the predictors impact on the likelihood of default and rank them according to their contribution to the model outcome. Prediction is achieved by two Machine Learning algorithms (eXtreme Gradient Boosting and FeedForward Neural Network) compared with three standard discriminant models. Results show that our analysis of the Italian Small and Medium Enterprises manufacturing industry benefits from the overall highest classification power by the eXtreme Gradient Boosting algorithm without giving up a rich interpretation framework.

Machine Learning Machine Learning Econometrics

Look Whos Talking Now: Implications of AVs Explanations on Drivers Trust, AV Preference, Anxiety and Mental Workload

174 - Na Du , Jacob Haspiel , Qiaoning Zhang 2019

Explanations given by automation are often used to promote automation adoption. However, it remains unclear whether explanations promote acceptance of automated vehicles (AVs). In this study, we conducted a within-subject experiment in a driving simulator with 32 participants, using four different conditions. The four conditions included: (1) no explanation, (2) explanation given before or (3) after the AV acted and (4) the option for the driver to approve or disapprove the AVs action after hearing the explanation. We examined four AV outcomes: trust, preference for AV, anxiety and mental workload. Results suggest that explanations provided before an AV acted were associated with higher trust in and preference for the AV, but there was no difference in anxiety and workload. These results have important implications for the adoption of AVs.

Human-Computer Interaction Computers and Society Robotics

Whos talking first? Consensus or lack thereof in coevolving opinion formation models

369 - Cecilia Nardini 2007

We investigate different opinion formation models on adaptive network topologies. Depending on the dynamical process, rewiring can either (i) lead to the elimination of interactions between agents in different states, and accelerate the convergence to a consensus state or break the network in non-interacting groups or (ii) counter-intuitively, favor the existence of diverse interacting groups for exponentially long times. The mean-field analysis allows to elucidate the mechanisms at play. Strikingly, allowing the interacting agents to bear more than one opinion at the same time drastically changes the models behavior and leads to fast consensus.

Physics and Society

comments

Fetching comments

Oran 1 University

Additional details More universities

Look whos not talking

Ask ChatGPT about the research

No Arabic abstract

Read More