ﻻ يوجد ملخص باللغة العربية
In the domain of social signal processing, audio event detection is a promising avenue for accessing daily behaviors that contribute to health and well-being. However, despite advances in mobile computing and machine learning, audio behavior detection models are largely constrained to data collected in controlled settings, such as call centers. This is problematic as it means their performance is unlikely to generalize to real-world applications. In this paper, we present a novel dataset of infant distress vocalizations compiled from over 780 hours of real-world audio data, collected via recorders worn by infants. We develop a model that combines deep spectrum and acoustic features to detect and classify infant distress vocalizations, which dramatically outperforms models trained on equivalent real-world data (F1 score of 0.630 vs 0.166). We end by discussing how dataset size can facilitate such gains in accuracy, critical when considering noisy and complex naturalistic data.
It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordi
Existing prescriptive compression strategies used in hearing aid fitting are designed based on gain averages from a group of users which are not necessarily optimal for a specific user. Nearly half of hearing aid users prefer settings that differ fro
The present paper introduces a deep neural network (DNN) for predicting the instantaneous loudness of a sound from its time waveform. The DNN was trained using the output of a more complex model, called the Cambridge loudness model. While a modern PC
This paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with lang
In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognit