ﻻ يوجد ملخص باللغة العربية
This study investigated the waveform representation for audio signal classification. Recently, many studies on audio waveform classification such as acoustic event detection and music genre classification have been published. Most studies on audio waveform classification have proposed the use of a deep learning (neural network) framework. Generally, a frequency analysis method such as Fourier transform is applied to extract the frequency or spectral information from the input audio waveform before inputting the raw audio waveform into the neural network. In contrast to these previous studies, in this paper, we propose a novel waveform representation method, in which audio waveforms are represented as a bit sequence, for audio classification. In our experiment, we compare the proposed bit representation waveform, which is directly given to a neural network, to other representations of audio waveforms such as a raw audio waveform and a power spectrum with two classification tasks: one is an acoustic event classification task and the other is a sound/music classification task. The experimental results showed that the bit representation waveform achieved the best classification performance for both the tasks.
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio sam
The social media revolution has produced a plethora of web services to which users can easily upload and share multimedia documents. Despite the popularity and convenience of such services, the sharing of such inherently personal data, including spee
End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these models typi
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain w
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Feature