A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

79 0 0.0 ( 0 )

Download Cite

Added by David Brookes

Publication date 2019

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors David H. Brookes - Akosua Busia - Clara Fannjiang

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs.

rate research

Neural Expectation Maximization

351 - Klaus Greff , Sjoerd van Steenkiste , Jurgen Schmidhuber 2017

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.

Machine Learning Neural and Evolutionary Computing Machine Learning

Deep Learning Through the Lens of Example Difficulty

135 - Robert J. N. Baldock , Hartmut Maennel , Behnam Neyshabur 2021

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our extensive investigation reveals surprising yet simple relationships between the prediction depth of a given input and the models uncertainty, confidence, accuracy and speed of learning for that data point. We further categorize difficult examples into three interpretable groups, demonstrate how these groups are processed differently inside deep models and showcase how this understanding allows us to improve prediction accuracy. Insights from our study lead to a coherent view of a number of separately reported phenomena in the literature: early layers generalize while later layers memorize; early layers converge faster and networks learn easy data and simple functions first.

Machine Learning Machine Learning

Submodular Maximization Through Barrier Functions

79 - Ashwinkumar Badanidiyuru , Amin Karbasi , Ehsan Kazemi 2020

In this paper, we introduce a novel technique for constrained submodular maximization, inspired by barrier functions in continuous optimization. This connection not only improves the running time for constrained submodular maximization but also provides the state of the art guarantee. More precisely, for maximizing a monotone submodular function subject to the combination of a $k$-matchoid and $ell$-knapsack constraint (for $ellleq k$), we propose a potential function that can be approximately minimized. Once we minimize the potential function up to an $epsilon$ error it is guaranteed that we have found a feasible set with a $2(k+1+epsilon)$-approximation factor which can indeed be further improved to $(k+1+epsilon)$ by an enumeration technique. We extensively evaluate the performance of our proposed algorithm over several real-world applications, including a movie recommendation system, summarization tasks for YouTube videos, Twitter feeds and Yelp business locations, and a set cover problem.

Machine Learning Machine Learning

A Look at the Effect of Sample Design on Generalization through the Lens of Spectral Analysis

53 - Bhavya Kailkhura , Jayaraman J. Thiagarajan , Qunwei Li 2019

This paper provides a general framework to study the effect of sampling properties of training data on the generalization error of the learned machine learning (ML) models. Specifically, we propose a new spectral analysis of the generalization error, expressed in terms of the power spectra of the sampling pattern and the function involved. The framework is build in the Euclidean space using Fourier analysis and establishes a connection between some high dimensional geometric objects and optimal spectral form of different state-of-the-art sampling patterns. Subsequently, we estimate the expected error bounds and convergence rate of different state-of-the-art sampling patterns, as the number of samples and dimensions increase. We make several observations about generalization error which are valid irrespective of the approximation scheme (or learning architecture) and training (or optimization) algorithms. Our result also sheds light on ways to formulate design principles for constructing optimal sampling methods for particular problems.

Machine Learning Machine Learning

Gated recurrent units viewed through the lens of continuous time dynamical systems

74 - Ian D. Jordan , Piotr Aleksander Sokol , Il Memming Park 2019

Gated recurrent units (GRUs) are specialized memory elements for building recurrent neural networks. Despite their incredible success on various tasks, including extracting dynamics underlying neural data, little is understood about the specific dynamics representable in a GRU network. As a result, it is both difficult to know a priori how successful a GRU network will perform on a given task, and also their capacity to mimic the underlying behavior of their biological counterparts. Using a continuous time analysis, we gain intuition on the inner workings of GRU networks. We restrict our presentation to low dimensions, allowing for a comprehensive visualization. We found a surprisingly rich repertoire of dynamical features that includes stable limit cycles (nonlinear oscillations), multi-stable dynamics with various topologies, and homoclinic bifurcations. At the same time we were unable to train GRU networks to produce continuous attractors, which are hypothesized to exist in biological neural networks. We contextualize the usefulness of different kinds of observed dynamics and support our claims experimentally.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions