New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces

تجارب التحقق من المتكلم للبالغين والأطفال الذين يستخدمون مساحات التضمين المشترك

442 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

shared embedding spaces embedding spaces speaker verification experiments أماكن التضمين المشترك تضمين المساحات تجارب التحقق من المتكلم صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children's speech. This is due to the acoustic mismatch between training and testing data. To capture more acoustic variability we trained a shared system with mixed data from adults and children. The shared system yields the best EER for children with no degradation for adults. Thus, the single system trained with mixed data is applicable for speaker verification for both adults and children.

References used

https://aclanthology.org/

rate research

Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system

381 - Association for Computation Linguistics 2021 مقالة

In recent years, speech synthesis system can generate speech with high speech quality. However, multi-speaker text-to-speech (TTS) system still require large amount of speech data for each target speaker. In this study, we would like to construct a m ulti-speaker TTS system by incorporating two sub modules into artificial neural network-based speech synthesis system to alleviate this problem. First module is to add speaker embedding into encoding module for generating speech while a large amount of the speech data from target speaker is not necessary. For speaker embedding method, in our study, two main speaker embedding methods, namely speaker verification embedding and voice conversion embedding, are compared to deciding which one is suitable for our personalized TTS system. Second, we substituted the conventional post-net module, which is adopted to enhance the output spectrum sequence, to further improving the speech quality of the generated speech utterance. Here, a post-filter network is used. Finally, experiment results showed that the speaker embedding is useful by adding it into encoding module and the resultant speech utterance indeed perceived as the target speaker. Also, the post-filter network not only improving the speech quality and also enhancing the speaker similarity of the generated speech utterances. The constructed TTS system can generate a speech utterance of the target speaker in fewer than 2 seconds. In the future, we would like to further investigate the controllability of the speaking rate or perceived emotion state of the generated speech.

speech synthesis system tts system نظام تخليق الكلام نظام TTS. صناعة حمض الفوسفور

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

321 - Association for Computation Linguistics 2021 مقالة

In cross-lingual language models, representations for many different languages live in the same space. Here, we investigate the linguistic and non-linguistic factors affecting sentence-level alignment in cross-lingual pretrained language models for 1 01 languages and 5,050 language pairs. Using BERT-based LaBSE and BiLSTM-based LASER as our models, and the Bible as our corpus, we compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance, as well as four intrinsic measures of vector space alignment and isomorphism. We then examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics. The results of our analyses show that word order agreement and agreement in morphological complexity are two of the strongest linguistic predictors of cross-linguality. We also note in-family training data as a stronger predictor than language-specific training data across the board. We verify some of our linguistic findings by looking at the effect of morphological segmentation on English-Inuktitut alignment, in addition to examining the effect of word order agreement on isomorphism for 66 zero-shot language pairs from a different corpus. We make the data and code for our experiments publicly available.

massively multilingual analysis shared embedding space massively multilingual تحليل متعدد اللغات بشكل كبير مشترك مساحة التضمين متعددة اللغات بشكل كبير صناعة حمض الفوسفور المزيد..

Discussion on domain generalization in the cross-device speaker verification system

259 - Association for Computation Linguistics 2021 مقالة

In this paper, we use domain generalization to improve the performance of the cross-device speaker verification system. Based on a trainable speaker verification system, we use domain generalization algorithms to fine-tune the model parameters. First , we use the VoxCeleb2 dataset to train ECAPA-TDNN as a baseline model. Then, use the CHT-TDSV dataset and the following domain generalization algorithms to fine-tune it: DANN, CDNN, Deep CORAL. Our proposed system tests 10 different scenarios in the NSYSU-TDSV dataset, including a single device and multiple devices. Finally, in the scenario of multiple devices, the best equal error rate decreased from 18.39 in the baseline to 8.84. Successfully achieved cross-device identification on the speaker verification system.

speaker verification system cross-device speaker verification speaker verification نظام التحقق من المتكلم التحقق من مكبر الصوت عبر الأجهزة التحقق من المتكلم صناعة حمض الفوسفور المزيد..

Overview and Insights from the SCIVER shared task on Scientific Claim Verification

431 - Association for Computation Linguistics 2021 مقالة

We present an overview of the SCIVER shared task, presented at the 2nd Scholarly Document Processing (SDP) workshop at NAACL 2021. In this shared task, systems were provided a scientific claim and a corpus of research abstracts, and asked to identify which articles Support or Refute the claim as well as provide evidentiary sentences justifying those labels. 11 teams made a total of 14 submissions to the shared task leaderboard, leading to an improvement of more than +23 F1 on the primary task evaluation metric. In addition to surveying the participating systems, we provide several insights into modeling approaches to support continued progress and future research on the important and challenging task of scientific claim verification.

sciver shared task scientific claim verification سكيف مشترك المهمة التحقق العلمي التحقق صناعة حمض الفوسفور

Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

258 - Association for Computation Linguistics 2021 مقالة

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embed ding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context-insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

مكمل multi-view embedding spaces مساحات تضمين متعددة نمذجة الموضوع صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces

تجارب التحقق من المتكلم للبالغين والأطفال الذين يستخدمون مساحات التضمين المشترك

Ask ChatGPT about the research

Read More

suggested questions