Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Vocal markers from sustained phonation in Huntingtons Disease

40 0 0.0 ( 0 )

Download Cite

Added by Rachid Riad

Publication date 2020

fields Electronic Engineering Informatics Engineering

and research's language is English

Authors Rachid Riad - Hadrien Titeux - Laurie Lemoine

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntingtons Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntingtons Disease.

rate research

Detection of COVID-19 through the analysis of vocal fold oscillations

94 - Mahmoud Al Ismail , Soham Deshmukh , Rita Singh 2020

Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans. It is a complex bio-mechanical process that is highly sensitive to changes in the speakers respiratory parameters. Since most symptomatic cases of COVID-19 present with moderate to severe impairment of respiratory functions, we hypothesize that signatures of COVID-19 may be observable by examining the vibrations of the vocal folds. Our goal is to validate this hypothesis, and to quantitatively characterize the changes observed to enable the detection of COVID-19 from voice. For this, we use a dynamical system model for the oscillation of the vocal folds, and solve it using our recently developed ADLES algorithm to yield vocal fold oscillation patterns directly from recorded speech. Experimental results on a clinically curated dataset of COVID-19 positive and negative subjects reveal characteristic patterns of vocal fold oscillations that are correlated with COVID-19. We show that these are prominent and discriminative enough that even simple classifiers such as logistic regression yields high detection accuracies using just the recordings of isolated extended vowels.

Audio and Speech Processing Machine Learning Sound

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

83 - Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres 2019

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of variational autoencoders. It employs separate encoders to learn disentangled latent representations of singer identity and vocal technique separately, with a joint decoder for reconstruction. Conversion is carried out by simple vector arithmetic in the learned latent spaces. Both a quantitative analysis as well as a visualization of the converted spectrograms show that our model is able to disentangle singer identity and vocal technique and successfully perform conversion of these attributes. To the best of our knowledge, this is the first work to jointly tackle conversion of singer identity and vocal technique based on a deep learning approach.

Audio and Speech Processing Machine Learning Sound

Speech waveform synthesis from MFCC sequences with generative adversarial networks

163 - Lauri Juvela , Bajibabu Bollepalli , Xin Wang 2018

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.

Audio and Speech Processing Computation and Language Sound

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

152 - Kong Aik Lee , Ville Hautamaki , Tomi Kinnunen 2019

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE08 to SRE18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.

Audio and Speech Processing Computation and Language Sound

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

106 - Ye Bai , Jiangyan Yi , Jianhua Tao 2019

Attention-based encoder-decoder (AED) models have achieved promising performance in speech recognition. However, because of the end-to-end training, an AED model is usually trained with speech-text paired data. It is challenging to incorporate external text-only data into AED models. Another issue of the AED model is that it does not use the right context of a text token while predicting the token. To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence. The method is divided into two stages. First, in the representation stage, a language model is trained on the text. It can be seen as that the knowledge in the text is compressed into the LM. Then, at the transferring stage, the knowledge is transferred to the AED model via teacher-student learning. To further use the whole context of the text sentence, we propose an LM called causal cloze completer (COR), which estimates the probability of a token, given both the left context and the right context of it. Therefore, with LST training, the AED model can leverage the whole context in the sentence. Different from fusion based methods, which use LM during decoding, the proposed method does not increase any extra complexity at the inference stage. We conduct experiments on two scales of public Chinese datasets AISHELL-1 and AISHELL-2. The experimental results demonstrate the effectiveness of leveraging external text-only data and the whole context in a sentence with our proposed method, compared with baseline hybrid systems and AED model based systems.

Audio and Speech Processing Computation and Language Sound

comments

Fetching comments

Mamoun Private University For Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Vocal markers from sustained phonation in Huntingtons Disease

Ask ChatGPT about the research

No Arabic abstract

Read More