ﻻ يوجد ملخص باللغة العربية
This paper describes a fast speaker search system to retrieve segments of the same voice identity in the large-scale data. A recent study shows that Locality Sensitive Hashing (LSH) enables quick retrieval of a relevant voice in the large-scale data in conjunction with i-vector while maintaining accuracy. In this paper, we proposed Random Speaker-variability Subspace (RSS) projection to map a data into LSH based hash tables. We hypothesized that rather than projecting on completely random subspace without considering data, projecting on randomly generated speaker variability space would give more chance to put the same speaker representation into the same hash bins, so we can use less number of hash tables. Multiple RSS can be generated by randomly selecting a subset of speakers from a large speaker cohort. From the experimental result, the proposed approach shows 100 times and 7 times faster than the linear search and LSH, respectively
Speaker-conditioned target speaker extraction systems rely on auxiliary information about the target speaker to extract the target speaker signal from a mixture of multiple speakers. Typically, a deep neural network is applied to isolate the relevant
The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation. The quality of speaker embeddings is paramount to the performance of speaker diarisation systems. Despite this, prior works in the field have direct
Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speakers identity. These identities are not known a pri
Speaker extraction aims to mimic humans selective auditory attention by extracting a target speakers voice from a multi-talker environment. It is common to perform the extraction in frequency-domain, and reconstruct the time-domain signal from the ex
In this paper, we study a novel technique that exploits the interaction between speaker traits and linguistic content to improve both speaker verification and utterance verification performance. We implement an idea of speaker-utterance dual attentio