Do you want to publish a course? Click here

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

98   0   0.0 ( 0 )
 Added by Minqiang Xu Dr.
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same speaker verification system, which only uses VoxCeleb2-dev as our training set. This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement. Our system is a fusion of 9 models and achieves first place in these two tracks of VoxSRC 2021. The minDCF of our submission is 0.1034, and the corresponding EER is 1.8460%.



rate research

Read More

94 - Keke Wang , Xudong Mao , Hao Wu 2021
This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverses test set and the VoxSRC-21s test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.
In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.
This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.
153 - Xu Xiang 2020
This report describes the systems submitted to the first and second tracks of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020, which ranked second in both tracks. Three key points of the system pipeline are explored: (1) investigating multiple CNN architectures including ResNet, Res2Net and dual path network (DPN) to extract the x-vectors, (2) using a composite angular margin softmax loss to train the speaker models, and (3) applying score normalization and system fusion to boost the performance. Measured on the VoxSRC-20 Eval set, the best submitted systems achieve an EER of $3.808%$ and a MinDCF of $0.1958$ in the close-condition track 1, and an EER of $3.798%$ and a MinDCF of $0.1942$ in the open-condition track 2, respectively.
In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. Two approaches are adopted. One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study. Another is to use Kaldi extract x-vector and to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا