Investigation of the Assessment of Infant Vocalizations by Laypersons

34 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Franz Anders

تاريخ النشر 2021

مجال البحث هندسة إلكترونية

والبحث باللغة English

تأليف Franz Anders - Mario Hlawitschka -

معالجة الصوت والكلام

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The goal of this investigation was the assessment of acoustic infant vocalizations by laypersons. More specifically, the goal was to identify (1) the set of most salient classes for infant vocalizations, (2) their relationship to each other and to affective ratings, and (3) proposals for classification schemes based on these labels and relationships. The assessment behavior of laypersons has not yet been investigated, as current infant vocalization classification schemes have been aimed at professional and scientific applications. The study methodology was based on the Nijmegen protocol, in which participants rated vocalization recordings regarding acoustic class labels, and continuous affective scales valence, tense arousal and energetic arousal. We determined consensus stimuli ratings as well as stimuli similarities based on participant ratings. Our main findings are: (1) we identified 9 salient labels, (2) valence has the overall greatest association to label ratings, (3) there is a strong association between label and valence ratings in the negative valence space, but low association for neutral labels, and (4) stimuli separability is highest when grouping labels into 3 - 5 classes. We finally propose two classification schemes based on these findings.

قيم البحث

143 - Xuewen Yao , Dong He , Tiancheng Jing 2019

It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordi ngs of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mothers speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mothers speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

Bringing in the outliers: A sparse subspace clustering approach to learn a dictionary of mouse ultrasonic vocalizations

65 - Jiaxi Wang , Karel Mundnich , Allison T. Knoll 2020

Mice vocalize in the ultrasonic range during social interactions. These vocalizations are used in neuroscience and clinical studies to tap into complex behaviors and states. The analysis of these ultrasonic vocalizations (USVs) has been traditionally a manual process, which is prone to errors and human bias, and is not scalable to large scale analysis. We propose a new method to automatically create a dictionary of USVs based on a two-step spectral clustering approach, where we split the set of USVs into inlier and outlier data sets. This approach is motivated by the known degrading performance of sparse subspace clustering with outliers. We apply spectral clustering to the inlier data set and later find the clusters for the outliers. We propose quantitative and qualitative performance measures to evaluate our method in this setting, where there is no ground truth. Our approach outperforms two baselines based on k-means and spectral clustering in all of the proposed performance measures, showing greater distances between clusters and more variability between clusters.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

A Comparison Study on Infant-Parent Voice Diarization

56 - Junzhe Zhu , Mark Hasegawa-Johnson , Nancy McElwain 2020

We design a framework for studying prelinguistic child voicefrom 3 to 24 months based on state-of-the-art algorithms in di-arization. Our system consists of a time-invariant feature ex-tractor, a context-dependent embedding generator, and a clas-sifi er. We study the effect of swapping out different compo-nents of the system, as well as changing loss function, to findthe best performance. We also present a multiple-instancelearning technique that allows us to pre-train our parame-ters on larger datasets with coarser segment boundary labels.We found that our best system achieved 43.8% DER on testdataset, compared to 55.4% DER achieved by LENA soft-ware. We also found that using convolutional feature extrac-tor instead of logmel features significantly increases the per-formance of neural diarization.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

Detection of Infant Crying in Real-World Home Environments Using Deep Learning

100 - Xuewen Yao , Megan Micheletti , Mckensey Johnson 2020

In the domain of social signal processing, audio event detection is a promising avenue for accessing daily behaviors that contribute to health and well-being. However, despite advances in mobile computing and machine learning, audio behavior detectio n models are largely constrained to data collected in controlled settings, such as call centers. This is problematic as it means their performance is unlikely to generalize to real-world applications. In this paper, we present a novel dataset of infant distress vocalizations compiled from over 780 hours of real-world audio data, collected via recorders worn by infants. We develop a model that combines deep spectrum and acoustic features to detect and classify infant distress vocalizations, which dramatically outperforms models trained on equivalent real-world data (F1 score of 0.630 vs 0.166). We end by discussing how dataset size can facilitate such gains in accuracy, critical when considering noisy and complex naturalistic data.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

ASR-Free Pronunciation Assessment

92 - Sitong Cheng , Zhixin Liu , Lantian Li 2020

Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/words), we can still tell how good a pronunciation is, by comparatively listening to some speech data from the target language. Our analysis shows that this new scoring approach provides an interesting correction for the phone-competition problem of GOP. Experimental results on the ERJ dataset demonstrated that combining the ASR-free score and GOP can achieve better performance than the GOP baseline.

معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Investigation of the Assessment of Infant Vocalizations by Laypersons

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً