ﻻ يوجد ملخص باللغة العربية
Recent efforts have been made on domestic activities classification from audio recordings, especially the works submitted to the challenge of DCASE (Detection and Classification of Acoustic Scenes and Events) since 2018. In contrast, few studies were done on domestic activities clustering, which is a newly emerging problem. Domestic activities clustering from audio recordings aims at merging audio clips which belong to the same class of domestic activity into a single cluster. Domestic activities clustering is an effective way for unsupervised estimation of daily activities performed in home environment. In this study, we propose a method for domestic activities clustering using a convolutional capsule autoencoder network (CCAN). In the method, the deep embeddings are learned by the autoencoder in the CCAN, while the deep embeddings which belong to the same class of domestic activities are merged into a single cluster by a clustering layer in the CCAN. Evaluated on a public dataset adopted in DCASE-2018 Task 5, the results show that the proposed method outperforms state-of-the-art methods in terms of the metrics of clustering accuracy and normalized mutual information.
In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent vari
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications. One of the most popular variants of these FCNs is the `U-Net, which is an encoder-decoder network with skip
In this work, we investigate the effectiveness of two techniques for improving variational autoencoder (VAE) based voice conversion (VC). First, we reconsider the relationship between vocoder features extracted using the high quality vocoders adopted
High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an encoder-decoder neura
The paper deals with the hitherto neglected topic of audio dequantization. It reviews the state-of-the-art sparsity-based approaches and proposes several new methods. Convex as well as non-convex approaches are included, and all the presented formula