Manifold Learning for Organizing Unstructured Sets of Process Observations

194 0 0.0 ( 0 )

Download Cite

Added by Felix Dietrich

Publication date 2018

fields Physics

and research's language is English

Authors Felix Dietrich - Mahdi Kooshkbaghi - Erik M. Bollt

Data Analysis Statistics and Probability

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Data mining is routinely used to organize ensembles of short temporal observations so as to reconstruct useful, low-dimensional realizations of an underlying dynamical system. In this paper, we use manifold learning to organize unstructured ensembles of observations (trials) of a systems response surface. We have no control over where every trial starts; and during each trial operating conditions are varied by turning agnostic knobs, which change system parameters in a systematic but unknown way. As one (or more) knobs turn we record (possibly partial) observations of the system response. We demonstrate how such partial and disorganized observation ensembles can be integrated into coherent response surfaces whose dimension and parametrization can be systematically recovered in a data-driven fashion. The approach can be justified through the Whitney and Takens embedding theorems, allowing reconstruction of manifolds/attractors through different types of observations. We demonstrate our approach by organizing unstructured observations of response surfaces, including the reconstruction of a cusp bifurcation surface for Hydrogen combustion in a Continuous Stirred Tank Reactor. Finally, we demonstrate how this observation-based reconstruction naturally leads to informative transport maps between input parameter space and output/state variable spaces.

rate research

An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning

222 - Felix P. Kemeth , Sindre W. Haugland , Felix Dietrich 2017

Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here on the case of time series data that can ultimately be modelled as a spatially distributed system (e.g. a partial differential equation, PDE), but where we do not know the space in which this PDE should be formulated. Hence, even the spatial coordinates for the distributed system themselves need to be identified - to emerge from - the data mining process. We will first validate this emergent space reconstruction for time series sampled without space labels in known PDEs; this brings up the issue of observability of physical space from temporal observation data, and the transition from spatially resolved to lumped (order-parameter-based) representations by tuning the scale of the data mining kernels. We will then present actual emergent space discovery illustrations. Our illustrative examples include chimera states (states of coexisting coherent and incoherent dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics, arising in partial differential equations and/or in heterogeneous networks. We also discuss how data-driven spatial coordinates can be extracted in ways invariant to the nature of the measuring instrument. Such gauge-invariant data mining can go beyond the fusion of heterogeneous observations of the same system, to the possible matching of apparently different systems.

Data Analysis Statistics and Probability

Deep learning for Gaussian process tomography model selection using the ASDEX Upgrade SXR system

83 - Francisco Matos , Jakob Svensson , Andrea Pavone 2020

Gaussian process tomography (GPT) is a method used for obtaining real-time tomographic reconstructions of the plasma emissivity profile in a tokamak, given some model for the underlying physical processes involved. GPT can also be used, thanks to Bayesian formalism, to perform model selection -- i.e., comparing different models and choosing the one with maximum evidence. However, the computations involved in this particular step may become slow for data with high dimensionality, especially when comparing the evidence for many different models. Using measurements collected by the ASDEX Upgrade Soft X-ray (SXR) diagnostic, we train a convolutional neural network (CNN) to map SXR tomographic projections to the corresponding GPT model whose evidence is highest. We then compare the networks results, and the time required to calculate them, with those obtained through analytical Bayesian formalism. In addition, we use the networks classifications to produce tomographic reconstructions of the plasma emissivity profile, whose quality we evaluate by comparing their projection into measurement space with the existing measurements themselves.

Data Analysis Statistics and Probability Image and Video Processing

Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation

235 - Georg A. Gottwald , Sebastian Reich 2020

Data-driven prediction and physics-agnostic machine-learning methods have attracted increased interest in recent years achieving forecast horizons going well beyond those to be expected for chaotic dynamical systems. In a separate strand of research data-assimilation has been successfully used to optimally combine forecast models and their inherent uncertainty with incoming noisy observations. The key idea in our work here is to achieve increased forecast capabilities by judiciously combining machine-learning algorithms and data assimilation. We combine the physics-agnostic data-driven approach of random feature maps as a forecast model within an ensemble Kalman filter data assimilation procedure. The machine-learning model is learned sequentially by incorporating incoming noisy observations. We show that the obtained forecast model has remarkably good forecast skill while being computationally cheap once trained. Going beyond the task of forecasting, we show that our method can be used to generate reliable ensembles for probabilistic forecasting as well as to learn effective model closure in multi-scale systems.

Data Analysis Statistics and Probability Machine Learning Computational Physics

Skewed distributions as limits of a formal evolutionary process

79 - F. Sattin 2017

Time series of observables measured from complex systems do often exhibit non-normal statistics, their statistical distributions (PDFs) are not gaussian and often skewed, with roughly exponential tails. Departure from gaussianity is related to the intermittent development of large-scale coherent structures. The existence of these structures is rooted into the nonlinear dynamical equations obeyed by each system, therefore it is expected that some prior knowledge or guessing of these equations is needed if one wishes to infer the corresponding PDF; conversely, the empirical knowledge of the PDF does provide information about the underlying dynamics. In this work we suggest that it is not always necessary. We show how, under some assumptions, a formal evolution equation for the PDF $p(x)$ can be written down, corresponding to the progressive accumulation of measurements of the generic observable $x$. The limiting solution to this equation is computed analytically, and shown to interpolate between some of the most common distributions, Gamma, Beta and Gaussian PDFs. The control parameter is just the ratio between the rms of the fluctuations and the range of allowed values. Thus, no information about the dynamics is required.

Data Analysis Statistics and Probability

A Markov Process Inspired Cellular Automata Model of Road Traffic

469 - Fa Wang , Li Li , Jianming Hu 2008

To provide a more accurate description of the driving behaviors in vehicle queues, a namely Markov-Gap cellular automata model is proposed in this paper. It views the variation of the gap between two consequent vehicles as a Markov process whose stationary distribution corresponds to the observed distribution of practical gaps. The multiformity of this Markov process provides the model enough flexibility to describe various driving behaviors. Two examples are given to show how to specialize it for different scenarios: usually mentioned flows on freeways and start-up flows at signalized intersections. The agreement between the empirical observations and the simulation results suggests the soundness of this new approach.

Data Analysis Statistics and Probability