No Arabic abstract
We address the problem of estimating direction-of-arrivals (DOAs) for multiple acoustic sources in a reverberant environment using a spherical microphone array. It is well-known that multi-source DOA estimation is challenging in the presence of room reverberation, environmental noise and overlapping sources. In this work, we introduce multiple schemes to improve the robustness of estimation consistency (EC) approach in reverberant and noisy conditions through redefined and modified parametric weights. Simulation results show that our proposed methods achieve superior performance compared to the existing EC approach, especially when the sources are spatially close in a reverberant environment.
In this paper, we show that a multi-mode antenna (MMA) is an interesting alternative to a conventional phased antenna array for direction-of-arrival (DoA) estimation. By MMA we mean a single physical radiator with multiple ports, which excite different characteristic modes. In contrast to phased arrays, a closed-form mathematical model of the antenna response, like a steering vector, is not straightforward to define for MMAs. Instead one has to rely on calibration measurement or electromagnetic field (EMF) simulation data, which is discrete. To perform DoA estimation, array interpolation technique (AIT) and wavefield modeling (WM) are suggested as methods with inherent interpolation capabilities, fully taking antenna nonidealities like mutual coupling into account. We present a non-coherent DoA estimator for low-cost receivers and show how coherent DoA estimation and joint DoA and polarization estimation can be performed with MMAs. Utilizing these methods, we assess the DoA estimation performance of an MMA prototype in simulations for both 2D and 3D cases. The results show that WM outperforms AIT for high SNR. Coherent estimation is superior to non-coherent, especially in 3D, because non-coherent suffers from estimation ambiguities. In conclusion, DoA estimation with a single MMA is feasible and accurate.
In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in data distributions. Towards improving generalization with less amount of labeled data, this paper presents a semi-supervised algorithm for heading direction estimation problem. We utilize temporal continuity as the unsupervised signal to regularize the model and achieve better generalization ability. This semi-supervised algorithm is applied to both training and testing phases, which increases the testing performance by a large margin. We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced. We also discuss several important details on improving the performance by balancing labeled and unlabeled loss, and making good combinations. Experimental results show that our approach robustly outputs the heading direction for different types of actor. The aesthetic value of the video is also improved in the aerial filming task.
With the introduction of shared spectrum sensing and beam-forming based multi-antenna transceivers, 5G networks demand spectrum sensing to identify opportunities in time, frequency, and spatial domains. Narrow beam-forming makes it difficult to have spatial sensing (direction-of-arrival, DoA, estimation) in a centralized manner, and with the evolution of paradigms such as artificial intelligence of Things (AIOT), ultra-reliable low latency communication (URLLC) services and distributed networks, intelligence for edge devices (Edge-AI) is highly desirable. It helps to reduce the data-communication overhead compared to cloud-AI-centric networks and is more secure and free from scalability limitations. However, achieving desired functional accuracy is a challenge on edge devices such as microcontroller units (MCU) due to area, memory, and power constraints. In this work, we propose low complexity neural network-based algorithm for accurate DoA estimation and its efficient mapping on the off-the-self MCUs. An ad-hoc graphical-user interface (GUI) is developed to configure the STM32 NUCLEO-H743ZI2 MCU with the proposed algorithm and to validate its functionality. The performance of the proposed algorithm is analyzed for different signal-to-noise ratios (SNR), word-length, the number of antennas, and DoA resolution. In-depth experimental results show that it outperforms the conventional statistical spatial sensing approach.
The estimation of the polarization $P$ of extragalactic compact sources in Cosmic Microwave Background images is a very important task in order to clean these images for cosmological purposes -- as, for example, to constrain the tensor-to-scalar ratio of primordial fluctuations during inflation -- and also to obtain relevant astrophysical information about the compact sources themselves in a frequency range, $ u sim 10$--$200$ GHz, where observations have only very recently started to be available. In this paper we propose a Bayesian maximum a posteriori (MAP) approach estimation scheme which incorporates prior information about the distribution of the polarization fraction of extragalactic compact sources between 1 and 100 GHz. We apply this Bayesian scheme to white noise simulations and to more realistic simulations that include CMB intensity, Galactic foregrounds and instrumental noise with the characteristics of the QUIJOTE experiment Wide Survey at 11 GHz. Using these simulations, we also compare our Bayesian method with the frequentist Filtered Fusion method that has been already used in WMAP data and in the emph{Planck} mission. We find that the Bayesian method allows us to decrease the threshold for a feasible estimation of $P$ to levels below $sim 100$ mJy (as compared to $sim 500$ mJy that was the equivalent threshold for the frequentist Filtered Fusion). We compare the bias introduced by the Bayesian method and find it to be small in absolute terms. Finally, we test the robustness of the Bayesian estimator against uncertainties in the prior and in the flux density of the sources. We find that the Bayesian estimator is robust against moderate changes in the parameters of the prior and almost insensitive to realistic errors in the estimated photometry of the sources.
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech. However, the original model requires a fixed (and known) number of speakers, which limits its application to real conversations. In this paper, we extend TS-VAD to speaker diarization with unknown numbers of speakers. This is achieved by two steps: first, an initial diarization system is applied for speaker number estimation, followed by TS-VAD network output masking according to this estimate. We further investigate different diarization methods, including clustering-based and region proposal networks, for estimating the initial i-vectors. Since these systems have complementary strengths, we propose a fusion-based method to combine frame-level decisions from the systems for an improved initialization. We demonstrate through experiments on variants of the LibriCSS meeting corpus that our proposed approach can improve the DER by up to 50% relative across varying numbers of speakers. This improvement also results in better downstream ASR performance approaching that using oracle segments.