Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

154 0 0.0 ( 0 )

Download Cite

Added by Tian Wang

Publication date 2015

fields Informatics Engineering

and research's language is English

Authors Peter Li - Jiyuan Qian - Tian Wang

Sound Information Retrieval Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these shallow architectures, feature engineering and learning are typically disjoint and unrelated. Additionally, feature engineering is difficult, and typically depends on extensive domain expertise. In this paper, we present an application of convolutional neural networks for the task of automatic musical instrument identification. In this model, feature extraction and learning algorithms are trained together in an end-to-end fashion. We show that a convolutional neural network trained on raw audio can achieve performance surpassing traditional methods that rely on hand-crafted features.

rate research

Automatic tagging using deep convolutional neural networks

99 - Keunwoo Choi , George Fazekas , Mark Sandler 2016

We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.

Sound Machine Learning

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

138 - Taejin Park , Taejin Lee 2015

A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features from CNNs. To create the learned features from CNNs, we not only used a conventional spectrogram image, but also proposed multiresolution recurrence plots (MRPs) that contain the phase information of a raw input signal. Consequently, we fed the characteristic timbre of the particular instrument into a neural network, which cannot be extracted using a phase-blinded representations such as a spectrogram. By combining our proposed MRPs and spectrogram images with a multi-column network, the performance of our proposed classifier system improves over a system that uses only a spectrogram. Furthermore, the proposed classifier also outperforms the baseline result from traditional handcrafted features and classifiers.

Sound Information Retrieval

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

155 - Titouan Parcollet , Ying Zhang , Mohamed Morchid 2018

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

Sound Machine Learning Audio and Speech Processing

Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

110 - Carlos Lordelo , Emmanouil Benetos , Simon Dixon 2021

This paper proposes a deep convolutional neural network for performing note-level instrument assignment. Given a polyphonic multi-instrumental music signal along with its ground truth or predicted notes, the objective is to assign an instrumental source for each note. This problem is addressed as a pitch-informed classification task where each note is analysed individually. We also propose to utilise several kernel shapes in the convolutional layers in order to facilitate learning of efficient timbre-discriminative feature maps. Experiments on the MusicNet dataset using 7 instrument classes show that our approach is able to achieve an average F-score of 0.904 when the original multi-pitch annotations are used as the pitch information for the system, and that it also excels if the note information is provided using third-party multi-pitch estimation algorithms. We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.

Sound Information Retrieval Machine Learning

End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

224 - Daniel Stoller , Simon Durand , Sebastian Ewert 2019

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude.

Sound Machine Learning Audio and Speech Processing

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions