Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition

تجانس عشوائي متسلسل للتعرف على الكلام قوية

944 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

References used

https://aclanthology.org/

rate research

Inverted Projection for Robust Speech Translation

894 - Association for Computation Linguistics 2021 مقالة

Traditional translation systems trained on written documents perform well for text-based translation but not as well for speech-based applications. We aim to adapt translation models to speech by introducing actual lexical errors from ASR and segment ation errors from automatic punctuation into our translation training data. We introduce an inverted projection approach that projects automatically detected system segments onto human transcripts and then re-segments the gold translations to align with the projected human transcripts. We demonstrate that this overcomes the train-test mismatch present in other training approaches. The new projection approach achieves gains of over 1 BLEU point over a baseline that is exposed to the human transcripts and segmentations, and these gains hold for both IWSLT data and YouTube data.

robust speech translation robust speech ترجمة خطوة قوية خطاب قوي قوي صناعة حمض الفوسفور

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

675 - Association for Computation Linguistics 2021 مقالة

Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-tra ined acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text is unexplored, which hinders the utilization of acoustic and linguistic information. Moreover, previous works simply replace the embedding layer of the pre-trained language model with the acoustic features, which may cause the catastrophic forgetting problem. In this work, we introduce Wav-BERT, a cooperative acoustic and linguistic representation learning method to fuse and utilize the contextual information of speech and text. Specifically, we unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework. A Representation Aggregation Module is designed to aggregate acoustic and linguistic representation, and an Embedding Attention Module is introduced to incorporate acoustic information into BERT, which can effectively facilitate the cooperation of two pre-trained models and thus boost the representation learning. Extensive experiments show that our Wav-BERT significantly outperforms the existing approaches and achieves state-of-the-art performance on low-resource speech recognition.

low-resource speech recognition linguistic representation learning التعرف على الكلام منخفض الموارد التمثيل اللغوي التعلم صناعة حمض الفوسفور

Automatic Speech Recognition Algorithms

3210 - Higher Institute for Applied Sciences and Technology 2017 رسالة ماجستير

In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and the search engine. While isolated word recognition systems do not contain language modeling, which is responsible for connecting words together to form understandable sentences.

Nlp التعرف على الكلام لغويات asr Automatic Speech Recognition

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

672 - Association for Computation Linguistics 2021 مقالة

Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic co rrection while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.

chinese speech recognition world speech recognition اعتراف الكلام الصينية اعتراف الكلام العالمي صناعة حمض الفوسفور

Speech Emotion Recognition Based on CNN+LSTM Model

1261 - Association for Computation Linguistics 2021 مقالة

Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be tween machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.

emotion recognition based speech emotion recognition emotion recognition العاطفة الاعتراف مقرها التعرف على العاطفة الكلام العاطفة الاعتراف صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition

تجانس عشوائي متسلسل للتعرف على الكلام قوية

Ask ChatGPT about the research

Read More

suggested questions