ﻻ يوجد ملخص باللغة العربية
Deep neural network based speaker recognition systems can easily be deceived by an adversary using minuscule imperceptible perturbations to the input speech samples. These adversarial attacks pose serious security threats to the speaker recognition systems that use speech biometric. To address this concern, in this work, we propose a new defense mechanism based on a hybrid adversarial training (HAT) setup. In contrast to existing works on countermeasures against adversarial attacks in deep speaker recognition that only use class-boundary information by supervised cross-entropy (CE) loss, we propose to exploit additional information from supervised and unsupervised cues to craft diverse and stronger perturbations for adversarial training. Specifically, we employ multi-task objectives using CE, feature-scattering (FS), and margin losses to create adversarial perturbations and include them for adversarial training to enhance the robustness of the model. We conduct speaker recognition experiments on the Librispeech dataset, and compare the performance with state-of-the-art projected gradient descent (PGD)-based adversarial training which employs only CE objective. The proposed HAT improves adversarial accuracy by absolute 3.29% and 3.18% for PGD and Carlini-Wagner (CW) attacks respectively, while retaining high accuracy on benign examples.
Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of several smart speakers and personal agents that interact with an individuals voice co
In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial invariance arc
In real-life applications, the performance of speaker recognition systems always degrades when there is a mismatch between training and evaluation data. Many domain adaptation methods have been successfully used for eliminating the domain mismatches
Attacking deep learning based biometric systems has drawn more and more attention with the wide deployment of fingerprint/face/speaker recognition systems, given the fact that the neural networks are vulnerable to the adversarial examples, which have
The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and a