Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

353 0 0.0 ( 0 )

Download Cite

Added by Taiji Suzuki

Publication date 2012

fields Mathematical Statistics

and research's language is English

Authors Taiji Suzuki - Masashi Sugiyama

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate the learning rate of multiple kernel learning (MKL) with $ell_1$ and elastic-net regularizations. The elastic-net regularization is a composition of an $ell_1$-regularizer for inducing the sparsity and an $ell_2$-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both $ell_1$ and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $ell_1$-regularization; otherwise, a faster convergence rate for the $ell_1$-regularization is shown.

rate research

Sparsity-accuracy trade-off in MKL

488 - Ryota Tomioka , Taiji Suzuki 2010

We empirically investigate the best trade-off between sparse and uniformly-weighted multiple kernel learning (MKL) using the elastic-net regularization on real and simulated datasets. We find that the best trade-off parameter depends not only on the sparsity of the true kernel-weight spectrum but also on the linear dependence among kernels and the number of samples.

Machine Learning Applications Methodology

Reconciling modern machine learning practice and the bias-variance trade-off

151 - Mikhail Belkin , Daniel Hsu , Siyuan Ma 2018

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This double descent curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

Machine Learning Machine Learning

Online Multiple Kernel Learning for Structured Prediction

371 - Andre F.T. Martins , Mario A. T. Figueiredo , Pedro M. Q. Aguiar 2010

Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that drawback. We show regret, convergence, and generalization bounds for the proposed method. Experiments on handwriting recognition and dependency parsing testify for the successfulness of the approach.

Machine Learning

Fast learning rate of deep learning via a kernel perspective

64 - Taiji Suzuki 2017

We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. To estimate a good finite dimensional model, we consider both of empirical risk minimization and Bayesian deep learning. We derive its generalization error bound and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and the convergence rate can be faster than $O(1/sqrt{n})$ rate which has been shown in the existing studies.

Statistics Theory Machine Learning Machine Learning

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

55 - Thomas M. Moerland , Anna Deichler , Simone Baldi 2020

Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning introduces a new question: how should we balance time spend on planning, learning and acting? The importance of this trade-off has not been explicitly studied before. We show that it is actually of key importance, with computational results indicating that we should neither plan too long nor too short. Conceptually, we identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.

Artificial Intelligence Machine Learning

comments

Fetching comments

University of Mosul

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

Ask ChatGPT about the research

No Arabic abstract

Read More