Geometric Foundations of Data Reduction

169 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ce Ju

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ce Ju

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The purpose of this paper is to write a complete survey of the (spectral) manifold learning methods and nonlinear dimensionality reduction (NLDR) in data reduction. The first two NLDR methods in history were respectively published in Science in 2000 in which they solve the similar reduction problem of high-dimensional data endowed with the intrinsic nonlinear structure. The intrinsic nonlinear structure is always interpreted as a concept in manifolds from geometry and topology in theoretical mathematics by computer scientists and theoretical physicists. In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps purposed by Belkin and Niyogi. In the typical manifold learning setup, the data set, also called the observation set, is distributed on or near a low dimensional manifold $M$ embedded in $mathbb{R}^D$, which yields that each observation has a $D$-dimensional representation. The goal of (spectral) manifold learning is to reduce these observations as a compact lower-dimensional representation based on the geometric information. The reduction procedure is called the (spectral) manifold learning method. In this paper, we derive each (spectral) manifold learning method with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language. Hence, we name the survey Geometric Foundations of Data Reduction.

قيم البحث

102 - Matthew Thorpe , Bao Wang 2021

Graph Laplacian (GL)-based semi-supervised learning is one of the most used approaches for classifying nodes in a graph. Understanding and certifying the adversarial robustness of machine learning (ML) algorithms has attracted large amounts of attent ion from different research communities due to its crucial importance in many security-critical applied domains. There is great interest in the theoretical certification of adversarial robustness for popular ML algorithms. In this paper, we provide the first adversarial robust certification for the GL classifier. More precisely we quantitatively bound the difference in the classification accuracy of the GL classifier before and after an adversarial attack. Numerically, we validate our theoretical certification results and show that leveraging existing adversarial defenses for the $k$-nearest neighbor classifier can remarkably improve the robustness of the GL classifier.

التعلم الآلي التحليل العددي التحليل العددي

A framework for data-driven solution and parameter estimation of PDEs using conditional generative adversarial networks

87 - Teeratorn Kadeethum , Daniel OMalley , Jan Niklas Fuhg 2021

This work is the first to employ and adapt the image-to-image translation concept based on conditional generative adversarial networks (cGAN) towards learning a forward and an inverse solution operator of partial differential equations (PDEs). Even t hough the proposed framework could be applied as a surrogate model for the solution of any PDEs, here we focus on steady-state solutions of coupled hydro-mechanical processes in heterogeneous porous media. Strongly heterogeneous material properties, which translate to the heterogeneity of coefficients of the PDEs and discontinuous features in the solutions, require specialized techniques for the forward and inverse solution of these problems. Additionally, parametrization of the spatially heterogeneous coefficients is excessively difficult by using standard reduced order modeling techniques. In this work, we overcome these challenges by employing the image-to-image translation concept to learn the forward and inverse solution operators and utilize a U-Net generator and a patch-based discriminator. Our results show that the proposed data-driven reduced order model has competitive predictive performance capabilities in accuracy and computational efficiency as well as training time requirements compared to state-of-the-art data-driven methods for both forward and inverse problems.

التعلم الآلي التحليل العددي التحليل العددي

Machine learning-based conditional mean filter: a generalization of the ensemble Kalman filter for nonlinear data assimilation

99 - Truong-Vinh Hoang 2021

Filtering is a data assimilation technique that performs the sequential inference of dynamical systems states from noisy observations. Herein, we propose a machine learning-based ensemble conditional mean filter (ML-EnCMF) for tracking possibly high- dimensional non-Gaussian state models with nonlinear dynamics based on sparse observations. The proposed filtering method is developed based on the conditional expectation and numerically implemented using machine learning (ML) techniques combined with the ensemble method. The contribution of this work is twofold. First, we demonstrate that the ensembles assimilated using the ensemble conditional mean filter (EnCMF) provide an unbiased estimator of the Bayesian posterior mean, and their variance matches the expected conditional variance. Second, we implement the EnCMF using artificial neural networks, which have a significant advantage in representing nonlinear functions over high-dimensional domains such as the conditional mean. Finally, we demonstrate the effectiveness of the ML-EnCMF for tracking the states of Lorenz-63 and Lorenz-96 systems under the chaotic regime. Numerical results show that the ML-EnCMF outperforms the ensemble Kalman filter.

التعلم الآلي التحليل العددي التحليل العددي

Foundations of data imbalance and solutions for a data democracy

85 - Ajay Kulkarni , Deri Chong , Feras A. Batarseh 2021

Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which cause imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny and a major challenge to a data democracy. In this chapter, two essential statistical elements are resolved: the degree of class imbalance and the complexity of the concept; solving such issues helps in building the foundations of a data democracy. Furthermore, statistical measures which are appropriate in these scenarios are discussed and implemented on a real-life dataset (car insurance claims). In the end, popular data-level methods such as random oversampling, random undersampling, synthetic minority oversampling technique, Tomek link, and others are implemented in Python, and their performance is compared.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Active operator inference for learning low-dimensional dynamical-system models from noisy data

101 - Wayne Isaac Tan Uy , Yuepeng Wang , Yuxiao Wen 2021

Noise poses a challenge for learning dynamical-system models because already small variations can distort the dynamics described by trajectory data. This work builds on operator inference from scientific machine learning to infer low-dimensional mode ls from high-dimensional state trajectories polluted with noise. The presented analysis shows that, under certain conditions, the inferred operators are unbiased estimators of the well-studied projection-based reduced operators from traditional model reduction. Furthermore, the connection between operator inference and projection-based model reduction enables bounding the mean-squared errors of predictions made with the learned models with respect to traditional reduced models. The analysis also motivates an active operator inference approach that judiciously samples high-dimensional trajectories with the aim of achieving a low mean-squared error by reducing the effect of noise. Numerical experiments with high-dimensional linear and nonlinear state dynamics demonstrate that predictions obtained with active operator inference have orders of magnitude lower mean-squared errors than operator inference with traditional, equidistantly sampled trajectory data.

التعلم الآلي التحليل العددي التحليل العددي