ﻻ يوجد ملخص باللغة العربية
Speaker Diarization (i.e. determining who spoke and when?) for multi-speaker naturalistic interactions such as Peer-Led Team Learning (PLTL) sessions is a challenging task. In this study, we propose robust speaker clustering based on mixture of multivariate von Mises-Fisher distributions. Our diarization pipeline has two stages: (i) ground-truth segmentation; (ii) proposed speaker clustering. The ground-truth speech activity information is used for extracting i-Vectors from each speechsegment. We post-process the i-Vectors with principal component analysis for dimension reduction followed by lengthnormalization. Normalized i-Vectors are high-dimensional unit vectors possessing discriminative directional characteristics. We model the normalized i-Vectors with a mixture model consisting of multivariate von Mises-Fisher distributions. K-means clustering with cosine distance is chosen as baseline approach. The evaluation data is derived from: (i) CRSS-PLTL corpus; and (ii) three-meetings subset of AMI corpus. The CRSSPLTL data contain audio recordings of PLTL sessions which is student-led STEM education paradigm. Proposed approach is consistently better than baseline leading to upto 44.48% and 53.68% relative improvements for PLTL and AMI corpus, respectively. Index Terms: Speaker clustering, von Mises-Fisher distribution, Peer-led team learning, i-Vector, Naturalistic Audio.
Speaker diarization determines who spoke and when? in an audio stream. In this study, we propose a model-based approach for robust speaker clustering using i-vectors. The ivectors extracted from different segments of same speaker are correlated. We m
Speaker clustering is the task of forming speaker-specific groups based on a set of utterances. In this paper, we address this task by using Dominant Sets (DS). DS is a graph-based clustering algorithm with interesting properties that fits well to ou
The von Mises-Fisher distribution is one of the most widely used probability distributions to describe directional data. Finite mixtures of von Mises-Fisher distributions have found numerous applications. However, the likelihood function for the fini
Robust estimation of location and concentration parameters for the von Mises-Fisher distribution is discussed. A key reparametrisation is achieved by expressing the two parameters as one vector on the Euclidean space. With this representation, we fir
Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training