ﻻ يوجد ملخص باللغة العربية
We present a novel learning-based approach to compute the eigenmodes and acoustic transfer data for the sound synthesis of arbitrary solid objects. Our approach combines two network-based solutions to formulate a complete learning-based 3D modal sound model. This includes a 3D sparse convolution network as the eigendecomposition solver and an encoder-decoder network for the prediction of the Far-Field Acoustic Transfer maps (FFAT Maps). We use our approach to compute the vibration modes (eigenmodes) and FFAT maps for each mode (acoustic data) for arbitrary-shaped objects at interactive rates without any precomputed dataset for any new object. Our experimental results demonstrate the effectiveness and benefits of our approach. We compare its accuracy and efficiency with physically-based sound synthesis methods.
Neural audio synthesis is an actively researched topic, having yielded a wide range of techniques that leverages machine learning architectures. Google Magenta elaborated a novel approach called Differential Digital Signal Processing (DDSP) that inco
This paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot inte
Cross-speaker style transfer is crucial to the applications of multi-style and expressive speech synthesis at scale. It does not require the target speakers to be experts in expressing all styles and to collect corresponding recordings for model trai
This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We pro
As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output of intelligent machine more easily and intuitively, thus has attracted more and more attention. Due to the limitations of hig