Do you want to publish a course? Click here

أنظمة تركيب الكلام

1466   0   99   0 ( 0 )
 Publication date 2018
and research's language is العربية
 Created by Adel Arar




Ask ChatGPT about the research

No English abstract

References used
X. huang, . A. Acero and H. W. Hon, "Text-to-speech systems," in Spoken language processing, 2001, pp. 686 -840.
N. Halabi, "Modern Standard Arabic Phonetics for Speech Synthesis," UNIVERSITY OF SOUTHAMPTON, SOUTHAMPTON-United Kingdom, 2016.
R. Vanderslice, "Synthetic Elocution: Considerations in Automatic Orthographic-toPhonetic Conversion of English with Special Reference to Prosody," 1968.
N. G. M. A. Z. S. A.-M. Omayma Al-Dakkak, "EMOTION INCLUSION IN AN ARABIC TEXT-TO-SPEECH," in Signal Processing Conference, 2005 13th European, Antalya, Turkey, 2005.
J. L. A. Iain R. Murray, "Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion," The Journal of the Acoustical Society of America, 1993.
P. S. a. K. Rao, "Modeling pauses for Synthesis of storytelling style speech using unsupervised word features," in Second International Symposium on Computer Vision And the internet, Bengal, India, 2015.
K. Y. W. a. T. Takara, "Myanmar text-to-speech system with rule-based tone synthesis," Department of Information Engineering, University of the Ryukyus, Japan, 2011.
D. G. M. Taya, "Towards Expressive Arabic Text to Speech," FACULTY OF ENGINEERING, CAIRO UNIVERSITY, GIZA, EGYPT, 2014.
P. Taylor, "Text-to-speech synthesis," Cambridge University Press, Cambridge, United Kingdom, 2009.
S. O. Arık, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, J. Raiman, S. Sengupta and M. Shoeybi, "Deep Voice: Real-time Neural Text-to-Speech," Baidu Silicon Valley Artificial Intelligence Lab,, 2017.
P. C. A. C. L. G. A. H. Q. H. N. H. M. H. J. L. M. N. K. P. T. R. R. R. G. T. B. W. D. W. Z. W. H. Z. T. Capes, "Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System," in Interspeech, 2017.
J. Yamagishi, "An introduction to hmm-based speech synthesis," Tokyo Institute of Technology, Tokyo, Japan, 2006.
T. N. J. Y. S. S. T. M. A. W. B. e. a. H. Zen, "The HMMbased speech synthesis system (HTS) version 2.0," in SSW,pp.294-299, 2007.
T. F. D. MÁSTER, "Design and test of an Expressive Speech Synthesis System"
S. D. H. Z. ,. S. O. V. ,. G. K. A. S. K. K. Aaron van den Oord, "WAVENET: A GENERATIVE MODEL FOR RAW AUDIO," Google DeepMind, Google, London, UK, 2016.
rate research

Read More

The paper describes BUT's English to German offline speech translation (ST) systems developed for IWSLT2021. They are based on jointly trained Automatic Speech Recognition-Machine Translation models. Their performances is evaluated on MustC-Common te st set. In this work, we study their efficiency from the perspective of having a large amount of separate ASR training data and MT training data, and a smaller amount of speech-translation training data. Large amounts of ASR and MT training data are utilized for pre-training the ASR and MT models. Speech-translation data is used to jointly optimize ASR-MT models by defining an end-to-end differentiable path from speech to translations. For this purpose, we use the internal continuous representations from the ASR-decoder as the input to MT module. We show that speech translation can be further improved by training the ASR-decoder jointly with the MT-module using large amount of text-only MT training data. We also show significant improvements by training an ASR module capable of generating punctuated text, rather than leaving the punctuation task to the MT module.
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic s peech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also explore the feasibility of a full end-to-end speech translation (ST) model in the case of very constrained amount of ground truth labeled data. Our best system achieves the best performance among all submitted systems for Congolese Swahili to English and French with BLEU scores 7.7 and 13.7 respectively, and the second best result for Coastal Swahili to English with BLEU score 14.9.
The speech recognition is one of the most modern technologies, which entered force in various fields of life, whether medical or security or industrial techniques. Accordingly, many related systems were developed, which differ from each otherin fea ture extraction methods and classification methods. In this research,three systems have been created for speech recognition.They differ from each other in the used methods during the stage of features extraction.While the first system used MFCC algorithm, the second system used LPCC algorithm, and the third system used PLP algorithm.All these three systems used HMM as classifier. At the first, the performance of the speechrecognitionprocesswas studied and evaluatedfor all the proposedsystems separately. After that, the combination algorithm was applied separately on eachpair of the studied system algorithmsin order to study the effect of using the combination algorithm onthe improvement of the speech recognition process. Twokinds of errors(simultaneous errors and dependent errors) were usedto evaluate the complementaryof each pair of the studied systems, and to study the effectiveness of the combination on improving the performance of speech recognition process. It can be seen from the results of the comparison that the best improvement ratio of speech recognition has been obtained in the case of collection MFCC and PLP algorithms with recognition ratio of 93.4%.
While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as S iri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer's quadratic complexity with respect to the input sequence length prevents its adoption as is with audio signals, which are typically represented by long sequences. Current solutions resort to an initial sub-optimal compression based on a fixed sampling of raw audio features. Therefore, potentially useful linguistic information is not accessible to higher-level layers in the architecture. To solve this issue, we propose Speechformer, an architecture that, thanks to reduced memory usage in the attention layers, avoids the initial lossy compression and aggregates information only at a higher level according to more informed linguistic criteria. Experiments on three language pairs (en→de/es/nl) show the efficacy of our solution, with gains of up to 0.8 BLEU on the standard MuST-C corpus and of up to 4.0 BLEU in a low resource scenario.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا