Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

95 0 0.0 ( 0 )

Download Cite

Added by Xiong Xiao

Publication date 2020

fields Electronic Engineering Informatics Engineering

and research's language is English

Authors Xiong Xiao - Naoyuki Kanda - Zhuo Chen

Audio and Speech Processing Sound

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes the Microsoft speaker diarization system for monaural multi-talker recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker Recognition Challenge(VoxSRC) 2020. We will first explain our system design to address issues in handling real multi-talker recordings. We then present the details of the components, which include Res2Net-based speaker embedding extractor, conformer-based continuous speech separation with leakage filtering, and a modified DOVER (short for Diarization Output Voting Error Reduction) method for system fusion. We evaluate the systems with the data set provided by VoxSRCchallenge 2020, which contains real-life multi-talker audio collected from YouTube. Our best system achieves 3.71% and 6.23% of the diarization error rate (DER) on development set and evaluation set, respectively, being ranked the 1st at the diarization track of the challenge.

rate research

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020

406 - Hee Soo Heo , Bong-Jin Lee , Jaesung Huh 2020

This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models based on the popular ResNet architecture, and train a number of variants using a range of loss functions. Our results show significant improvements over most existing works without the use of model ensemble or post-processing. We release the training code and pre-trained models as unofficial baselines for this years challenge.

Audio and Speech Processing Sound

XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

141 - Jie Wang , Fuchuang Tong , Zhicong Chen 2021

This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.

Audio and Speech Processing Sound

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

153 - Weiqing Wang , Danwei Cai , Qingjian Lin 2021

This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set.

Audio and Speech Processing Sound

The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

94 - Keke Wang , Xudong Mao , Hao Wu 2021

This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverses test set and the VoxSRC-21s test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.

Sound Audio and Speech Processing

The DKU-DukeECE Systems for VoxCeleb Speaker Recognition Challenge 2020

99 - Weiqing Wang , Danwei Cai , Xiaoyi Qin 2020

In this paper, we present the system submission for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) by the DKU-DukeECE team. For track 1, we explore various kinds of state-of-the-art front-end extractors with different pooling layers and objective loss functions. For track 3, we employ an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). For track 4, we investigate the whole system pipeline for speaker diarization, including voice activity detection (VAD), uniform segmentation, speaker embedding extraction, and clustering.

Audio and Speech Processing

comments

Fetching comments

University of Aleppo

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

Ask ChatGPT about the research

No Arabic abstract

Read More