ﻻ يوجد ملخص باللغة العربية
We propose BeamTransformer, an efficient architecture to leverage beamformers edge in spatial filtering and transformers capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have been internally separated into different beams.
Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world. Distributed sensor arrays that consider several devices with a few microphones is a viable alternati
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry
Since space-domain information can be utilized, microphone array beamforming is often used to enhance the quality of the speech by suppressing directional disturbance. However, with the increasing number of microphone, the complexity would be increas
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text
In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trai