ﻻ يوجد ملخص باللغة العربية
In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification. In contrast to the commonly used loss functions for speech enhancement such as the L2 loss, the VoiceID loss is based on the feedback from a speaker verification model to generate a ratio mask. The generated ratio mask is multiplied pointwise with the original spectrogram to filter out unnecessary components for speaker verification. In the experiments, we observed that the enhancement network, after training with the VoiceID loss, is able to ignore a substantial amount of time-frequency bins, such as those dominated by noise, for verification. The resulting model consistently improves the speaker verification system on both clean and noisy conditions.
Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates
The goal of this paper is text-independent speaker verification where utterances come from in the wild videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise problem, existing methods to produce the speaker embe
Meta-learning (ML) has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained w
The ResNet-based architecture has been widely adopted to extract speaker embeddings for text-independent speaker verification systems. By introducing the residual connections to the CNN and standardizing the residual blocks, the ResNet structure is c
Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker