Do you want to publish a course? Click here

An Augmented Lagrangian Method for Piano Transcription using Equal Loudness Thresholding and LSTM-based Decoding

209   0   0.0 ( 0 )
 Added by Sebastian Ewert
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

A central goal in automatic music transcription is to detect individual note events in music recordings. An important variant is instrument-dependent music transcription where methods can use calibration data for the instruments in use. However, despite the additional information, results rarely exceed an f-measure of 80%. As a potential explanation, the transcription problem can be shown to be badly conditioned and thus relies on appropriate regularization. A recently proposed method employs a mixture of simple, convex regularizers (to stabilize the parameter estimation process) and more complex terms (to encourage more meaningful structure). In this paper, we present two extensions to this method. First, we integrate a computational loudness model to better differentiate real from spurious note detections. Second, we employ (Bidirectional) Long Short Term Memory networks to re-weight the likelihood of detected note constellations. Despite their simplicity, our two extensions lead to a drop of about 35% in note error rate compared to the state-of-the-art.



rate research

Read More

Given a musical audio recording, the goal of automatic music transcription is to determine a score-like representation of the piece underlying the recording. Despite significant interest within the research community, several studies have reported on a glass ceiling effect, an apparent limit on the transcription accuracy that current methods seem incapable of overcoming. In this paper, we explore how much this effect can be mitigated by focusing on a specific instrument class and making use of additional information on the recording conditions available in studio or home recording scenarios. In particular, exploiting the availability of single note recordings for the instrument in use we develop a novel signal model employing variable-length spectro-temporal patterns as its central building blocks - tailored for pitched percussive instruments such as the piano. Temporal dependencies between spectral templates are modeled, resembling characteristics of factorial scaled hidden Markov models (FS-HMM) and other methods combining Non-Negative Matrix Factorization with Markov processes. In contrast to FS-HMMs, our parameter estimation is developed in a global, relaxed form within the extensible alternating direction method of multipliers (ADMM) framework, which enables the systematic combination of basic regularizers propagating sparsity and local stationarity in note activity with more complex regularizers imposing temporal semantics. The proposed method achieves an f-measure of 93-95% for note onsets on pieces recorded on a Yamaha Disklavier (MAPS DB).
Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/output representations, and complex decoding schemes. In this work, we show that equivalent performance can be achieved using a generic encoder-decoder Transformer with standard decoding methods. We demonstrate that the model can learn to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks. This sequence-to-sequence approach simplifies transcription by jointly modeling audio features and language-like output dependencies, thus removing the need for task-specific architectures. These results point toward possibilities for creating new Music Information Retrieval models by focusing on dataset creation and labeling rather than custom model design.
306 - Hongpeng Sun 2019
Augmented Lagrangian method (also called as method of multipliers) is an important and powerful optimization method for lots of smooth or nonsmooth variational problems in modern signal processing, imaging, optimal control and so on. However, one usually needs to solve the coupled and nonlinear system together and simultaneously, which is very challenging. In this paper, we proposed several semismooth Newton methods to solve the nonlinear subproblems arising in image restoration, which leads to several highly efficient and competitive algorithms for imaging processing. With the analysis of the metric subregularities of the corresponding functions, we give both the global convergence and local linear convergence rate for the proposed augmented Lagrangian methods with semismooth Newton solvers.
79 - Yinqiao Yan , Qingna Li 2019
Support vector machine (SVM) has proved to be a successful approach for machine learning. Two typical SVM models are the L1-loss model for support vector classification (SVC) and $epsilon$-L1-loss model for support vector regression (SVR). Due to the nonsmoothness of the L1-loss function in the two models, most of the traditional approaches focus on solving the dual problem. In this paper, we propose an augmented Lagrangian method for the L1-loss model, which is designed to solve the primal problem. By tackling the nonsmooth term in the model with Moreau-Yosida regularization and the proximal operator, the subproblem in augmented Lagrangian method reduces to a nonsmooth linear system, which can be solved via the quadratically convergent semismooth Newtons method. Moreover, the high computational cost in semismooth Newtons method can be significantly reduced by exploring the sparse structure in the generalized Jacobian. Numerical results on various datasets in LIBLINEAR show that the proposed method is competitive with the most popular solvers in both speed and accuracy.
The octagonal shrinkage and clustering algorithm for regression (OSCAR), equipped with the $ell_1$-norm and a pair-wise $ell_{infty}$-norm regularizer, is a useful tool for feature selection and grouping in high-dimensional data analysis. The computational challenge posed by OSCAR, for high dimensional and/or large sample size data, has not yet been well resolved due to the non-smoothness and inseparability of the regularizer involved. In this paper, we successfully resolve this numerical challenge by proposing a sparse semismooth Newton-based augmented Lagrangian method to solve the more general SLOPE (the sorted L-one penalized estimation) model. By appropriately exploiting the inherent sparse and low-rank property of the generalized Jacobian of the semismooth Newton system in the augmented Lagrangian subproblem, we show how the computational complexity can be substantially reduced. Our algorithm presents a notable advantage in the high-dimensional statistical regression settings. Numerical experiments are conducted on real data sets, and the results demonstrate that our algorithm is far superior, in both speed and robustness, than the existing state-of-the-art algorithms based on first-order iterative schemes, including the widely used accelerated proximal gradient (APG) method and the alternating direction method of multipliers (ADMM).
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا