Do you want to publish a course? Click here

Speech Enhancement and Denoising Using Wavelet

تحسين الكلام و التخلص من الضجيج باستخدام المويجات

1364   1   107   3.0 ( 1 )
 Publication date 2015
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

In this project we study wavelet and wavelet transform, and the possibility of its employment in the processing and analysis of the speech signal in order to enhance the signal and remove noise of it. We will present different algorithms that depend on the wavelet transform and the mechanism to apply them in order to get rid of noise in the speech, and compare the results of the application of these algorithms with some traditional algorithms that are used to enhance the speech.

References used
Stark, Henry; Woods, John W, "Probability, Random Processes, and Estimation Theory for Engineers". Prentice-Hall, Inc. ISBN 0-13-711706-X. 1986
Savita Hooda and Smriti Aggarwal,"Review of MMSE Estimator for Speech Enhancement", ijecs, Vol. 4 - Issue 5 (May - 2015)
Jonathan Le Roux, John R. Hershey , "INDIRECT MODEL-BASED SPEECH ENHANCEMENT", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Paper: SP-L3.5, March 2012 (ICASSP, TR2012-016)
rate research

Read More

The masking-based speech enhancement method pursues a multiplicative mask that applies to the spectrogram of input noise-corrupted utterance, and a deep neural network (DNN) is often used to learn the mask. In particular, the features commonly used f or automatic speech recognition can serve as the input of the DNN to learn the well-behaved mask that significantly reduce the noise distortion of processed utterances. This study proposes to preprocess the input speech features for the ideal ratio mask (IRM)-based DNN by lowpass filtering in order to alleviate the noise components. In particular, we employ the discrete wavelet transform (DWT) to decompose the temporal speech feature sequence and scale down the detail coefficients, which correspond to the high-pass portion of the sequence. Preliminary experiments conducted on a subset of TIMIT corpus reveal that the proposed method can make the resulting IRM achieve higher speech quality and intelligibility for the babble noise-corrupted signals compared with the original IRM, indicating that the lowpass filtered temporal feature sequence can learn a superior IRM network for speech enhancement.
Weather forecasting (especially rainfall) is one of the most important and challenging operational tasks carried out by meteorological services all over the world. Itis furthermore a complicated procedure that requires multiple specialized fields o f expertise. In this paper, a model based on artificial neural networks (ANNs) and wavelet Transform is proposed as tool to predict consecutive monthly rainfalls (1933-2009) taken of Homs Meteorological Station on accounts of the preceding events of rainfall data. The feed-forward neural network with back-propagation Algorithm is used in the learning and forecasting, where the time series of rain that detailed transactions and the approximate three levels of analysis using a Discrete wavelet transform (DWT). The study found that the neural network WNN structured )5-8-8-8-1(, able to predict the monthly rainfall in Homs station on the long-term correlation of determination and root mean squared-errors (0.98, 7.74mm), respectively. Wavelet Transform technique provides a useful feature based on the analysis of the data, which improves the performance of the model and applied this technique in ANNmodels for rain because it is simple, as this technique can be applied to other models.
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-re source and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
Bias mitigation approaches reduce models' dependence on sensitive features of data, such as social group tokens (SGTs), resulting in equal predictions across the sensitive features. In hate speech detection, however, equalizing model predictions may ignore important differences among targeted social groups, as hate speech can contain stereotypical language specific to each SGT. Here, to take the specific language about each SGT into account, we rely on counterfactual fairness and equalize predictions among counterfactuals, generated by changing the SGTs. Our method evaluates the similarity in sentence likelihoods (via pre-trained language models) among counterfactuals, to treat SGTs equally only within interchangeable contexts. By applying logit pairing to equalize outcomes on the restricted set of counterfactuals for each instance, we improve fairness metrics while preserving model performance on hate speech detection.
This paper aims to describe the approach we used to detect hope speech in the HopeEDI dataset. We experimented with two approaches. In the first approach, we used contextual embeddings to train classifiers using logistic regression, random forest, SV M, and LSTM based models. The second approach involved using a majority voting ensemble of 11 models which were obtained by fine-tuning pre-trained transformer models (BERT, ALBERT, RoBERTa, IndicBERT) after adding an output layer. We found that the second approach was superior for English, Tamil and Malayalam. Our solution got a weighted F1 score of 0.93, 0.75 and 0.49 for English, Malayalam and Tamil respectively. Our solution ranked 1st in English, 8th in Malayalam and 11th in Tamil.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا