New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Detection of Infant Crying in Real-World Home Environments Using Deep Learning

101 0 0.0 ( 0 )

Download Cite

Added by Xuewen Yao

Publication date 2020

fields Electronic Engineering Informatics Engineering

and research's language is English

Authors Xuewen Yao - Megan Micheletti - Mckensey Johnson

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In the domain of social signal processing, audio event detection is a promising avenue for accessing daily behaviors that contribute to health and well-being. However, despite advances in mobile computing and machine learning, audio behavior detection models are largely constrained to data collected in controlled settings, such as call centers. This is problematic as it means their performance is unlikely to generalize to real-world applications. In this paper, we present a novel dataset of infant distress vocalizations compiled from over 780 hours of real-world audio data, collected via recorders worn by infants. We develop a model that combines deep spectrum and acoustic features to detect and classify infant distress vocalizations, which dramatically outperforms models trained on equivalent real-world data (F1 score of 0.630 vs 0.166). We end by discussing how dataset size can facilitate such gains in accuracy, critical when considering noisy and complex naturalistic data.

rate research

Measuring Mother-Infant Emotions By Audio Sensing

143 - Xuewen Yao , Dong He , Tiancheng Jing 2019

It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mothers speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mothers speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future.

Audio and Speech Processing Machine Learning Sound

Personalization of Hearing Aid Compression by Human-In-Loop Deep Reinforcement Learning

168 - Nasim Alamdari , Edward Lobarinas , 2020

Existing prescriptive compression strategies used in hearing aid fitting are designed based on gain averages from a group of users which are not necessarily optimal for a specific user. Nearly half of hearing aid users prefer settings that differ from the commonly prescribed settings. This paper presents a human-in-loop deep reinforcement learning approach that personalizes hearing aid compression to achieve improved hearing perception. The developed approach is designed to learn a specific users hearing preferences in order to optimize compression based on the users feedbacks. Both simulation and subject testing results are reported which demonstrate the effectiveness of the developed personalized compression.

Audio and Speech Processing Machine Learning Sound

Fast computation of loudness using a deep neural network

129 - Josef Schlittenlacher , Richard E. Turner , Brian C. J. Moore 2019

The present paper introduces a deep neural network (DNN) for predicting the instantaneous loudness of a sound from its time waveform. The DNN was trained using the output of a more complex model, called the Cambridge loudness model. While a modern PC can perform a few hundred loudness computations per second using the Cambridge loudness model, it can perform more than 100,000 per second using the DNN, allowing real-time calculation of loudness. The root-mean-square deviation between the predictions of instantaneous loudness level using the two models was less than 0.5 phon for unseen types of sound. We think that the general approach of simulating a complex perceptual model by a much faster DNN can be applied to other perceptual models to make them run in real time.

Audio and Speech Processing Machine Learning Sound

StutterNet: Stuttering Detection Using Time Delay Neural Network

91 - Shakeel A. Sheikh , Md Sahidullah , Fabrice Hirsch 2021

This paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with language models for stuttering detection. Compared to the existing work, which depends on the ASR module, our method relies solely on the acoustic signal. We use a time-delay neural network (TDNN) suitable for capturing contextual aspects of the disfluent utterances. We evaluate our system on the UCLASS stuttering dataset consisting of more than 100 speakers. Our method achieves promising results and outperforms the state-of-the-art residual neural network based method. The number of trainable parameters of the proposed method is also substantially less due to the parameter sharing scheme of TDNN.

Audio and Speech Processing Machine Learning Sound

Personalized Keyphrase Detection using Speaker and Environment Information

163 - Rajeev Rikhye , Quan Wang , Qiao Liang 2021

In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditions, a speaker separation model is added to the feature frontend of the speaker verification model, and an adaptive noise cancellation (ANC) algorithm is included to exploit cross-microphone noise coherence. Our experiments show that the text-independent speaker verification model largely reduces the false triggering rate of the keyphrase detection, while the speaker separation model and adaptive noise cancellation largely reduce false rejections.

Audio and Speech Processing Machine Learning Sound

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Detection of Infant Crying in Real-World Home Environments Using Deep Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions