Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Smart Edition of MIDI Files

129 0 0.0 ( 0 )

Download Cite

Added by Pierre Roy

Publication date 2019

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Pierre Roy - Francois Pachet

Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We address the issue of editing musical performance data, in particular MIDI files representing human musical performances. Editing such sequences raises specific issues due to the ambiguous nature of musical objects. The first source of ambiguity is that musicians naturally produce many deviations from the metrical frame. These deviations may be intentional or subconscious, but they play an important role in conveying the groove or feeling of a performance. Relations between musical elements are also usually implicit, creating even more ambiguity. A note is in relation with the surrounding notes in many possible ways: it can be part of a melodic pattern, it can also play a harmonic role with the simultaneous notes, or be a pedal-tone. All these aspects play an essential role that should be preserved, as much as possible, when editing musical sequences. In this paper, we contribute specifically to the problem of editing non-quantized, metrical musical sequences represented as MIDI files. We first list of number of problems caused by the use of naive edition operations applied to performance data, using a motivating example. We then introduce a model, called Dancing MIDI, based on 1) two desirable, well-defined properties for edit operations and 2) two well-defined operations, Split and Concat, with an implementation. We show that our model formally satisfies the two properties, and that it prevents most of the problems that occur with naive edit operations on our motivating example, as well as on a real-world example using an automatic harmonizer.

rate research

Extremely Low Footprint End-to-End ASR System for Smart Device

99 - Zhifu Gao , Yiwu Yao , Shiliang Zhang 2021

Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer from requiring a large number of model parameters. We propose an extremely low footprint E2E ASR system for smart devices, to achieve the goal of satisfying resource constraints without sacrificing recognition accuracy. We design cross-layer weight sharing to improve parameter efficiency and further exploit model compression methods including sparsification and quantization, to reduce memory storage and boost decoding efficiency. We evaluate our approaches on the public AISHELL-1 and AISHELL-2 benchmarks. On the AISHELL-2 task, the proposed method achieves more than 10x compression (model size reduces from 248 to 24MB), at the cost of only minor performance loss (CER reduces from 6.49% to 6.92%).

Sound Audio and Speech Processing

Large-Scale MIDI-based Composer Classification

71 - Qiuqiang Kong , Keunwoo Choi , Yuxuan Wang 2020

Music classification is a task to classify a music piece into labels such as genres or composers. We propose large-scale MIDI based composer classification systems using GiantMIDI-Piano, a transcription-based dataset. We propose to use piano rolls, onset rolls, and velocity rolls as input representations and use deep neural networks as classifiers. To our knowledge, we are the first to investigate the composer classification problem with up to 100 composers. By using convolutional recurrent neural networks as models, our MIDI based composer classification system achieves a 10-composer and a 100-composer classification accuracies of 0.648 and 0.385 (evaluated on 30-second clips) and 0.739 and 0.489 (evaluated on music pieces), respectively. Our MIDI based composer system outperforms several audio-based baseline classification systems, indicating the effectiveness of using compact MIDI representations for composer classification.

Sound Computer Vision and Pattern Recognition Multimedia

Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

107 - Lee Callender , Curtis Hawthorne , Jesse Engel 2020

We introduce the Expanded Groove MIDI dataset (E-GMD), an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations. We use E-GMD to optimize classifiers for use in downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite similar results on classification metrics. Via the listening tests, we argue that standard classifier metrics, such as accuracy and F-measure score, are insufficient proxies of performance in downstream tasks because they do not fully align with the perceptual quality of generated outputs.

Sound Machine Learning

NELS - Never-Ending Learner of Sounds

87 - Benjamin Elizalde , Rohan Badlani , Ankit Shah 2018

Sounds are essential to how humans perceive and interact with the world and are captured in recordings and shared on the Internet on a minute-by-minute basis. These recordings, which are predominantly videos, constitute the largest archive of sounds we know. However, most of these recordings have undescribed content making necessary methods for automatic sound analysis, indexing and retrieval. These methods have to address multiple challenges, such as the relation between sounds and language, numerous and diverse sound classes, and large-scale evaluation. We propose a system that continuously learns from the web relations between sounds and language, improves sound recognition models over time and evaluates its learning competency in the large-scale without references. We introduce the Never-Ending Learner of Sounds (NELS), a project for continuously learning of sounds and their associated knowledge, available on line in nels.cs.cmu.edu

Sound Audio and Speech Processing

Generation of musical patterns through operads

102 - Samuele Giraudo 2021

We introduce the notion of multi-pattern, a combinatorial abstraction of polyphonic musical phrases. The interest of this approach lies in the fact that this offers a way to compose two multi-patterns in order to produce a longer one. This dives musical phrases into an algebraic context since the set of multi-patterns has the structure of an operad; operads being structures offering a formalization of the notion of operators and their compositions. Seeing musical phrases as operators allows us to perform computations on phrases and admits applications in generative music: given a set of short patterns, we propose various algorithms to randomly generate a new and longer phrase inspired by the inputted patterns.

Sound Audio and Speech Processing Combinatorics

comments

Fetching comments

Syrian Virtual University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Smart Edition of MIDI Files

Ask ChatGPT about the research

No Arabic abstract

Read More