No Arabic abstract
The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fishers discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning.
High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the curse of dimensionality states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the blessing of dimensionality, has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.
One-shot anonymous unselfishness in economic games is commonly explained by social preferences, which assume that people care about the monetary payoffs of others. However, during the last ten years, research has shown that different types of unselfish behaviour, including cooperation, altruism, truth-telling, altruistic punishment, and trustworthiness are in fact better explained by preferences for following ones own personal norms - internal standards about what is right or wrong in a given situation. Beyond better organising various forms of unselfish behaviour, this moral preference hypothesis has recently also been used to increase charitable donations, simply by means of interventions that make the morality of an action salient. Here we review experimental and theoretical work dedicated to this rapidly growing field of research, and in doing so we outline mathematical foundations for moral preferences that can be used in future models to better understand selfless human actions and to adjust policies accordingly. These foundations can also be used by artificial intelligence to better navigate the complex landscape of human morality.
The purpose of this paper is to write a complete survey of the (spectral) manifold learning methods and nonlinear dimensionality reduction (NLDR) in data reduction. The first two NLDR methods in history were respectively published in Science in 2000 in which they solve the similar reduction problem of high-dimensional data endowed with the intrinsic nonlinear structure. The intrinsic nonlinear structure is always interpreted as a concept in manifolds from geometry and topology in theoretical mathematics by computer scientists and theoretical physicists. In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps purposed by Belkin and Niyogi. In the typical manifold learning setup, the data set, also called the observation set, is distributed on or near a low dimensional manifold $M$ embedded in $mathbb{R}^D$, which yields that each observation has a $D$-dimensional representation. The goal of (spectral) manifold learning is to reduce these observations as a compact lower-dimensional representation based on the geometric information. The reduction procedure is called the (spectral) manifold learning method. In this paper, we derive each (spectral) manifold learning method with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language. Hence, we name the survey Geometric Foundations of Data Reduction.
I briefly present the foundations of relativistic cosmology, which are, General Relativity Theory and the Cosmological Principle. I discuss some relativistic models, namely, Einstein static universe and Friedmann universes. The classical bibliographic references for the relevant tensorial demonstrations are indicated whenever necessary, although the calculations themselves are not shown.
Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which cause imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny and a major challenge to a data democracy. In this chapter, two essential statistical elements are resolved: the degree of class imbalance and the complexity of the concept; solving such issues helps in building the foundations of a data democracy. Furthermore, statistical measures which are appropriate in these scenarios are discussed and implemented on a real-life dataset (car insurance claims). In the end, popular data-level methods such as random oversampling, random undersampling, synthetic minority oversampling technique, Tomek link, and others are implemented in Python, and their performance is compared.