ﻻ يوجد ملخص باللغة العربية
In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trained to reduce communication-costs through a penalty term which represents a task-performance/ communication-cost trade-off. While working within the trade-off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in complex echoic acoustic scenes with moving sources and show that it matches the performance of models that stream from a fixed number of microphones while reducing communication costs.
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of tradition
Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world. Distributed sensor arrays that consider several devices with a few microphones is a viable alternati
Recently, there is a research trend on ad-hoc microphone arrays. However, most research was conducted on simulated data. Although some data sets were collected with a small number of distributed devices, they were not synchronized which hinders the f
We propose BeamTransformer, an efficient architecture to leverage beamformers edge in spatial filtering and transformers capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from