أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Shang-Yi Chuang

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

144 - Yu-Wen Chen , Kuo-Hsuan Hung , Shang-Yi Chuang 2021

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to -speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

A Study of Incorporating Articulatory Movement Information in Speech Enhancement

60 - Yu-Wen Chen , Kuo-Hsuan Hung , Shang-Yi Chuang 2020

Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs). This study provides a pilot investigation on a novel multimodal audio-articulatory-movement SE (AAMSE) model to enhance SE performance under such challenging conditions. Articulatory movement features and acoustic signals were used as inputs to waveform-mapping-based and spectral-mapping-based SE systems with three fusion strategies. In addition, an ablation study was conducted to evaluate SE performance using a limited number of articulatory movement sensors. Experimental results confirm that, by combining the modalities, the AAMSE model notably improves the SE performance in terms of speech quality and intelligibility, as compared to conventional audio-only SE baselines.

معالجة الصوت والكلام

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد