Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LOCATA challenge: speaker localization with a planar array

82 0 0.0 ( 0 )

Download Cite

Added by Xinyuan Qian

Publication date 2019

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Xinyuan Qian - Andrea Cavallaro - Alessio Brutti

Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This document describes our submission to the 2018 LOCalization And TrAcking (LOCATA) challenge (Tasks 1, 3, 5). We estimate the 3D position of a speaker using the Global Coherence Field (GCF) computed from multiple microphone pairs of a DICIT planar array. One of the main challenges when using such an array with omnidirectional microphones is the front-back ambiguity, which is particularly evident in Task 5. We address this challenge by post-processing the peaks of the GCF and exploiting the attenuation introduced by the frame of the array. Moreover, the intermittent nature of speech and the changing orientation of the speaker make localization difficult. For Tasks 3 and 5, we also employ a Particle Filter (PF) that favors the spatio-temporal continuity of the localization results.

rate research

The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge

110 - Zhuo Li , Ce Fang , Runqiu Xiao 2021

This paper describes the systems submitted by team HCCL to the Far-Field Speaker Verification Challenge. Our previous work in the AIshell Speaker Verification Challenge 2019 shows that the powerful modeling abilities of Neural Network architectures can provide exceptional performance for this kind of task. Therefore, in this challenge, we focus on constructing deep Neural Network architectures based on TDNN, Resnet and Res2net blocks. Most of the developed systems consist of Neural Network embeddings are applied with PLDA backend. Firstly, the speed perturbation method is applied to augment data and significant performance improvements are achieved. Then, we explore the use of AMsoftmax loss function and propose to join a CE-loss branch when we train model using AMsoftmax loss. In addition, the impact of score normalization on performance is also investigated. The final system, a fusion of four systems, achieves minDCF 0.5342, EER 5.05% on task1 eval set, and achieves minDCF 0.5193, EER 5.47% on task3 eval set.

Sound Audio and Speech Processing

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021

170 - Li Zhang , Huan Zhao , Qinling Meng 2021

In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.

Sound Audio and Speech Processing

The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

94 - Keke Wang , Xudong Mao , Hao Wu 2021

This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverses test set and the VoxSRC-21s test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.

Sound Audio and Speech Processing

Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge

110 - Romain Lebarbenchon , Ewen Camberlein , Diego di Carlo 2018

This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox is made available online.

Sound Audio and Speech Processing

AvaTr: One-Shot Speaker Extraction with Transformers

373 - Shell Xu Hu , Md Rifat Arefin , Viet-Nhat Nguyen 2021

To extract the voice of a target speaker when mixed with a variety of other sounds, such as white and ambient noises or the voices of interfering speakers, we extend the Transformer network to attend the most relevant information with respect to the target speaker given the characteristics of his or her voices as a form of contextual information. The idea has a natural interpretation in terms of the selective attention theory. Specifically, we propose two models to incorporate the voice characteristics in Transformer based on different insights of where the feature selection should take place. Both models yield excellent performance, on par or better than published state-of-the-art models on the speaker extraction task, including separating speech of novel speakers not seen during training.

Sound Audio and Speech Processing

comments

Fetching comments

Mustansiriyah University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LOCATA challenge: speaker localization with a planar array

Ask ChatGPT about the research

No Arabic abstract

Read More