ﻻ يوجد ملخص باللغة العربية
In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority of ensemble methods combine predictions rather than learned features. We propose a single-model method that combines learned high-level features computed from log-scaled mel-spectrograms and raw audio data. These features are learned separately by two Convolutional Neural Networks, one for each input type, and then combined by densely connected layers within a single network. This relatively simple approach along with data augmentation ranks among the best two percent in the Freesound General-Purpose Audio Tagging Challenge on Kaggle.
Convolutional Neural Networks have been extensively explored in the task of automatic music tagging. The problem can be approached by using either engineered time-frequency features or raw audio as input. Modulation filter bank representations that h
Existing automatic music generation approaches that feature deep learning can be broadly classified into two types: raw audio models and symbolic models. Symbolic models, which train and generate at the note level, are currently the more prevalent ap
Audio signals are often represented as spectrograms and treated as 2D images. In this light, deep convolutional architectures are widely used for music audio tasks even though these two data types have very different structures. In this work, we atte
Music, speech, and acoustic scene sound are often handled separately in the audio domain because of their different signal characteristics. However, as the image domain grows rapidly by versatile image classification models, it is necessary to study
This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and shor