New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Maximum Likelihood-based Online Adaptation of Hyper-parameters in CMA-ES

92 0 0.0 ( 0 )

Download Cite

Added by Loshchilov Ilya

Publication date 2014

fields Informatics Engineering

and research's language is English

Authors Ilya Loshchilov

Neural and Evolutionary Computing Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is widely accepted as a robust derivative-free continuous optimization algorithm for non-linear and non-convex optimization problems. CMA-ES is well known to be almost parameterless, meaning that only one hyper-parameter, the population size, is proposed to be tuned by the user. In this paper, we propose a principled approach called self-CMA-ES to achieve the online adaptation of CMA-ES hyper-parameters in order to improve its overall performance. Experimental results show that for larger-than-default population size, the default settings of hyper-parameters of CMA-ES are far from being optimal, and that self-CMA-ES allows for dynamically approaching optimal settings.

rate research

A Computationally Efficient Limited Memory CMA-ES for Large Scale Optimization

91 - Ilya Loshchilov 2014

We propose a computationally efficient limited memory Covariance Matrix Adaptation Evolution Strategy for large scale optimization, which we call the LM-CMA-ES. The LM-CMA-ES is a stochastic, derivative-free algorithm for numerical optimization of non-linear, non-convex optimization problems in continuous domain. Inspired by the limited memory BFGS method of Liu and Nocedal (1989), the LM-CMA-ES samples candidate solutions according to a covariance matrix reproduced from $m$ direction vectors selected during the optimization process. The decomposition of the covariance matrix into Cholesky factors allows to reduce the time and memory complexity of the sampling to $O(mn)$, where $n$ is the number of decision variables. When $n$ is large (e.g., $n$ > 1000), even relatively small values of $m$ (e.g., $m=20,30$) are sufficient to efficiently solve fully non-separable problems and to reduce the overall run-time.

Neural and Evolutionary Computing

Genetic Algorithm based hyper-parameters optimization for transfer Convolutional Neural Network

373 - Chen Li , JinZhe Jiang , YaQian Zhao 2021

Hyperparameter optimization is a challenging problem in developing deep neural networks. Decision of transfer layers and trainable layers is a major task for design of the transfer convolutional neural networks (CNN). Conventional transfer CNN models are usually manually designed based on intuition. In this paper, a genetic algorithm is applied to select trainable layers of the transfer model. The filter criterion is constructed by accuracy and the counts of the trainable layers. The results show that the method is competent in this task. The system will converge with a precision of 97% in the classification of Cats and Dogs datasets, in no more than 15 generations. Moreover, backward inference according the results of the genetic algorithm shows that our method can capture the gradient features in network layers, which plays a part on understanding of the transfer AI models.

Neural and Evolutionary Computing Machine Learning

Alternative Restart Strategies for CMA-ES

167 - Ilya Loshchilov 2012

This paper focuses on the restart strategy of CMA-ES on multi-modal functions. A first alternative strategy proceeds by decreasing the initial step-size of the mutation while doubling the population size at each restart. A second strategy adaptively allocates the computational budget among the restart settings in the BIPOP scheme. Both restart strategies are validated on the BBOB benchmark; their generality is also demonstrated on an independent real-world problem suite related to spacecraft trajectory optimization.

Artificial Intelligence

Maximum likelihood estimation for disk image parameters

121 - Matwey V. Kornilov 2019

We present a novel technique for estimating disk parameters (the centre and the radius) from its 2D image. It is based on the maximal likelihood approach utilising both edge pixels coordinates and the image intensity gradients. We emphasise the following advantages of our likelihood model. It has closed-form formulae for parameter estimating, requiring less computational resources than iterative algorithms therefore. The likelihood model naturally distinguishes the outer and inner annulus edges. The proposed technique was evaluated on both synthetic and real data.

Image and Video Processing Instrumentation and Methods for Astrophysics Computer Vision and Pattern Recognition

Maximum Likelihood Estimation for Learning Populations of Parameters

109 - Ramya Korlakai Vinayak , Weihao Kong , Gregory Valiant 2019

Consider a setting with $N$ independent individuals, each with an unknown parameter, $p_i in [0, 1]$ drawn from some unknown distribution $P^star$. After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i sim text{Binomial}(t, p_i)$ per individual, our objective is to accurately estimate $P^star$. This problem arises in numerous domains, including the social sciences, psychology, health-care, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where $t ll N$, the maximum likelihood estimator (MLE) is both statistically minimax optimal and efficiently computable. Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $mathcal{O}(frac{1}{t})$ for $t < clog{N}$, with regards to the earth movers distance (between the estimated and true distributions). More generally, in an exponentially large interval of $t$ beyond $c log{N}$, the MLE achieves the minimax error bound of $mathcal{O}(frac{1}{sqrt{tlog N}})$. In contrast, regardless of how large $N$ is, the naive plug-in estimator for this problem only achieves the sub-optimal error of $Theta(frac{1}{sqrt{t}})$.

Statistics Theory Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Maximum Likelihood-based Online Adaptation of Hyper-parameters in CMA-ES

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions