Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification


Abstract in English

Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training strate-gies, are commonly used to bridge the performance gap caused bythe domain mismatch. However, such adversarial training strategyonly uses the distribution information of target domain data and cannot ensure the performance improvement on the target domain. Inthis paper, we incorporate self-supervised learning strategy to the un-supervised domain adaptation system and proposed a self-supervisedlearning based domain adaptation approach (SSDA). Compared tothe traditional UDA method, the new SSDA training strategy canfully leverage the potential label information from target domainand adapt the speaker discrimination ability from source domainsimultaneously. We evaluated the proposed approach on the Vox-Celeb (labeled source domain) and CnCeleb (unlabeled target do-main) datasets, and the best SSDA system obtains 10.2% Equal ErrorRate (EER) on the CnCeleb dataset without using any speaker labelson CnCeleb, which also can achieve the state-of-the-art results onthis corpus.

Download