Do you want to publish a course? Click here

Discussion on domain generalization in the cross-device speaker verification system

مناقشة حول تعميم المجال في نظام التحقق من مكبرات الصوت

245   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

In this paper, we use domain generalization to improve the performance of the cross-device speaker verification system. Based on a trainable speaker verification system, we use domain generalization algorithms to fine-tune the model parameters. First, we use the VoxCeleb2 dataset to train ECAPA-TDNN as a baseline model. Then, use the CHT-TDSV dataset and the following domain generalization algorithms to fine-tune it: DANN, CDNN, Deep CORAL. Our proposed system tests 10 different scenarios in the NSYSU-TDSV dataset, including a single device and multiple devices. Finally, in the scenario of multiple devices, the best equal error rate decreased from 18.39 in the baseline to 8.84. Successfully achieved cross-device identification on the speaker verification system.



References used
https://aclanthology.org/
rate research

Read More

We describe our experience with providing automatic simultaneous spoken language translation for an event with human interpreters. We provide a detailed overview of the systems we use, focusing on their interconnection and the issues it brings. We pr esent our tools to monitor the pipeline and a web application to present the results of our SLT pipeline to the end users. Finally, we discuss various challenges we encountered, their possible solutions and we suggest improvements for future deployments.
For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children's speech. This is due to the acoustic mismatch between training and testing data. To capture more acoustic variability we trained a shared system with mixed data from adults and children. The shared system yields the best EER for children with no degradation for adults. Thus, the single system trained with mixed data is applicable for speaker verification for both adults and children.
The importance of building semantic parsers which can be applied to new domains and generate programs unseen at training has long been acknowledged, and datasets testing out-of-domain performance are becoming increasingly available. However, little o r no attention has been devoted to learning algorithms or objectives which promote domain generalization, with virtually all existing approaches relying on standard supervised learning. In this work, we use a meta-learning framework which targets zero-shot domain generalization for semantic parsing. We apply a model-agnostic training algorithm that simulates zero-shot parsing by constructing virtual train and test sets from disjoint domains. The learning objective capitalizes on the intuition that gradient steps that improve source-domain performance should also improve target-domain performance, thus encouraging a parser to generalize to unseen target domains. Experimental results on the (English) Spider and Chinese Spider datasets show that the meta-learning objective significantly boosts the performance of a baseline parser.
As the average life expectancy of Chinese people rises, the health care problems of the elderly are becoming more diverse, and the demand for long-term care is also increasing. Therefore, how to help the elderly have a good quality of life and mainta in their dignity is what we need to think about. This research intends to explore the characteristics of natural language of normal aging people through a deep model. First, we collect information through focus groups so that the elders can naturally interact with other participants in the process. Then, through the word vector model and regression model, an executive function prediction model based on dialogue data is established to help understand the degradation trajectory of executive function and establish an early warning.
While neural networks produce state-of-the- art performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains. Previous works have proposed delexicalization as a form of knowledge di stillation to reduce the dependency on such lexical artifacts. However, a critical unsolved issue that remains is how much delexicalization to apply: a little helps reduce overfitting, but too much discards useful information. We propose Group Learning, a knowledge and model distillation approach for fact verification in which multiple student models have access to different delexicalized views of the data, but are encouraged to learn from each other through pair-wise consistency losses. In several cross-domain experiments between the FEVER and FNC fact verification datasets, we show that our approach learns the best delexicalization strategy for the given training dataset, and outperforms state-of-the-art classifiers that rely on the original data.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا