بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Rectified binaural ratio: A complex T-distributed feature for robust sound localization

84 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Antoine Deleforge

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Antoine Deleforge

أنظمة الصوت في الحاسوب تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Most existing methods in binaural sound source localization rely on some kind of aggregation of phase-and level-difference cues in the time-frequency plane. While different ag-gregation schemes exist, they are often heuristic and suffer in adverse noise conditions. In this paper, we introduce the rectified binaural ratio as a new feature for sound source local-ization. We show that for Gaussian-process point source signals corrupted by stationary Gaussian noise, this ratio follows a complex t-distribution with explicit parameters. This new formulation provides a principled and statistically sound way to aggregate binaural features in the presence of noise. We subsequently derive two simple and efficient methods for robust relative transfer function and time-delay estimation. Experiments on heavily corrupted simulated and speech signals demonstrate the robustness of the proposed scheme.

قيم البحث

94 - Pierre-Amaury Grumiaux , Srdan Kitic , Laurent Girin 2021

In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout b etween convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the systems ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Proximal binaural sound can induce subjective frisson

300 - Shiori Honda , Yuri Ishikawa , Rei Konno 2019

Auditory frisson is the experience of feeling of cold or shivering related to sound in the absence of a physical cold stimulus. Multiple examples of frisson-inducing sounds have been reported, but the mechanism of auditory frisson remains elusive. Ty pical frisson-inducing sounds may contain a looming effect, in which a sound appears to approach the listeners peripersonal space. Previous studies on sound in peripersonal space have provided objective measurements of sound-inducing effects, but few have investigated the subjective experience of frisson-inducing sounds. Here we explored whether it is possible to produce subjective feelings of frisson by moving a noise sound (white noise, rolling beads noise, or frictional noise produced by rubbing a plastic bag) stimulus around a listeners head. Our results demonstrated that sound-induced frisson can be experienced stronger when auditory stimuli are rotated around the head (binaural moving sounds) than the one without the rotation (monaural static sounds), regardless of the source of the noise sound. Pearsons correlation analysis showed that several acoustic features of auditory stimuli, such as variance of interaural level difference (ILD), loudness, and sharpness, were correlated with the magnitude of subjective frisson. We had also observed that the subjective feelings of frisson by moving a musical sound had increased comparing with a static musical sound.

أنظمة الصوت في الحاسوب الوسائط المتعددة معالجة الصوت والكلام

SLoClas: A Database for Joint Sound Localization and Classification

96 - Xinyuan Qian , Bidisha Sharma , Amine El Abridi 2021

In this work, we present the development of a new database, namely Sound Localization and Classification (SLoClas) corpus, for studying and analyzing sound localization and classification. The corpus contains a total of 23.27 hours of data recorded u sing a 4-channel microphone array. 10 classes of sounds are played over a loudspeaker at 1.5 meters distance from the array by varying the Direction-of-Arrival (DoA) from 1 degree to 360 degree at an interval of 5 degree. To facilitate the study of noise robustness, 6 types of outdoor noise are recorded at 4 DoAs, using the same devices. Moreover, we propose a baseline method, namely Sound Localization and Classification Network (SLCnet) and present the experimental results and analysis conducted on the collected SLoClas database. We achieve the accuracy of 95.21% and 80.01% for sound localization and classification, respectively. We publicly release this database and the source code for research purpose.

أنظمة الصوت في الحاسوب قواعد البيانات معالجة الصوت والكلام

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

175 - Christopher Schymura , Benedikt Bonninghoff , Tsubasa Ochiai 2021

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural n etworks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. Additionally, the estimated sound event positions are represented as multivariate Gaussian variables, yielding an additional notion of uncertainty, which many previously proposed deep learning-based systems designed for this application do not provide. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy. It outperforms all competing systems on all datasets with statistical significant differences in performance.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Visually Informed Binaural Audio Generation without Binaural Audios

128 - Xudong Xu , Hang Zhou , Ziwei Liu 2021

Stereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the r equirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods in real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received binaural audios. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference. Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

أنظمة الصوت في الحاسوب الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الإتحاد الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Rectified binaural ratio: A complex T-distributed feature for robust sound localization

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً