No Arabic abstract
Explanations given by automation are often used to promote automation adoption. However, it remains unclear whether explanations promote acceptance of automated vehicles (AVs). In this study, we conducted a within-subject experiment in a driving simulator with 32 participants, using four different conditions. The four conditions included: (1) no explanation, (2) explanation given before or (3) after the AV acted and (4) the option for the driver to approve or disapprove the AVs action after hearing the explanation. We examined four AV outcomes: trust, preference for AV, anxiety and mental workload. Results suggest that explanations provided before an AV acted were associated with higher trust in and preference for the AV, but there was no difference in anxiety and workload. These results have important implications for the adoption of AVs.
The objective of this work is speaker diarisation of speech recordings in the wild. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.
In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face is visible and the voice is audible simultaneously. Although active speaker detection is a crucial pre-processing step for many audio-visual tasks, there is no existing dataset of natural human speech to evaluate the performance of active speaker detection. We therefore curate the Active Speakers in the Wild (ASW) dataset which contains videos and co-occurring speech segments with dense speech activity labels. Videos and timestamps of audible segments are parsed and adopted from VoxConverse, an existing speaker diarisation dataset that consists of videos in the wild. Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way. Two reference systems, a self-supervised system and a fully supervised one, are evaluated on the dataset to provide the baseline performances of ASW. Cross-domain evaluation is conducted in order to show the negative effect of dubbed videos in the training data.
To better understand the impacts of similarities and dissimilarities in human and AV personalities we conducted an experimental study with 443 individuals. Generally, similarities in human and AV personalities led to a higher perception of AV safety only when both were high in specific personality traits. Dissimilarities in human and AV personalities also yielded a higher perception of AV safety, but only when the AV was higher than the human in a particular personality trait.
We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.
Academic research and the financial industry have recently paid great attention to Machine Learning algorithms due to their power to solve complex learning tasks. In the field of firms default prediction, however, the lack of interpretability has prevented the extensive adoption of the black-box type of models. To overcome this drawback and maintain the high performances of black-boxes, this paper relies on a model-agnostic approach. Accumulated Local Effects and Shapley values are used to shape the predictors impact on the likelihood of default and rank them according to their contribution to the model outcome. Prediction is achieved by two Machine Learning algorithms (eXtreme Gradient Boosting and FeedForward Neural Network) compared with three standard discriminant models. Results show that our analysis of the Italian Small and Medium Enterprises manufacturing industry benefits from the overall highest classification power by the eXtreme Gradient Boosting algorithm without giving up a rich interpretation framework.