Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

98 0 0.0 ( 0 )

Download Cite

Added by Minqiang Xu Dr.

Publication date 2021

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Miao Zhao - Yufeng Ma - Min Liu

Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same speaker verification system, which only uses VoxCeleb2-dev as our training set. This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement. Our system is a fusion of 9 models and achieves first place in these two tracks of VoxSRC 2021. The minDCF of our submission is 0.1034, and the corresponding EER is 1.8460%.

rate research

The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

94 - Keke Wang , Xudong Mao , Hao Wu 2021

This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverses test set and the VoxSRC-21s test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.

Sound Audio and Speech Processing

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021

170 - Li Zhang , Huan Zhao , Qinling Meng 2021

In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.

Sound Audio and Speech Processing

XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

141 - Jie Wang , Fuchuang Tong , Zhicong Chen 2021

This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.

Audio and Speech Processing Sound

The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020

153 - Xu Xiang 2020

This report describes the systems submitted to the first and second tracks of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020, which ranked second in both tracks. Three key points of the system pipeline are explored: (1) investigating multiple CNN architectures including ResNet, Res2Net and dual path network (DPN) to extract the x-vectors, (2) using a composite angular margin softmax loss to train the speaker models, and (3) applying score normalization and system fusion to boost the performance. Measured on the VoxSRC-20 Eval set, the best submitted systems achieve an EER of $3.808%$ and a MinDCF of $0.1958$ in the close-condition track 1, and an EER of $3.798%$ and a MinDCF of $0.1942$ in the open-condition track 2, respectively.

Sound Computation and Language Audio and Speech Processing

Query Expansion System for the VoxCeleb Speaker Recognition Challenge 2020

121 - Yu-Sen Cheng , Chun-Liang Shih , Tien-Hong Lo 2020

In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. Two approaches are adopted. One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study. Another is to use Kaldi extract x-vector and to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score.

Sound Computation and Language Audio and Speech Processing

comments

Fetching comments

University of Babylon

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

Ask ChatGPT about the research

No Arabic abstract

Read More