No Arabic abstract
Spectral features of the empirical moment matrix constitute a resourceful tool for unveiling properties of a cloud of points, among which, density, support and latent structures. It is already well known that the empirical moment matrix encodes a great deal of subtle attributes of the underlying measure. Starting from this object as base of observations we combine ideas from statistics, real algebraic geometry, orthogonal polynomials and approximation theory for opening new insights relevant for Machine Learning (ML) problems with data supported on singular sets. Refined concepts and results from real algebraic geometry and approximation theory are empowering a simple tool (the empirical moment matrix) for the task of solving non-trivial questions in data analysis. We provide (1) theoretical support, (2) numerical experiments and, (3) connections to real world data as a validation of the stamina of the empirical moment matrix approach.
Outlier detection methods have become increasingly relevant in recent years due to increased security concerns and because of its vast application to different fields. Recently, Pauwels and Lasserre (2016) noticed that the sublevel sets of the inverse Christoffel function accurately depict the shape of a cloud of data using a sum-of-squares polynomial and can be used to perform outlier detection. In this work, we propose a kernelized variant of the inverse Christoffel function that makes it computationally tractable for data sets with a large number of features. We compare our approach to current methods on 15 different data sets and achieve the best average area under the precision recall curve (AUPRC) score, the best average rank and the lowest root mean square deviation.
We present an algorithm for data-driven reachability analysis that estimates finite-horizon forward reachable sets for general nonlinear systems using level sets of a certain class of polynomials known as Christoffel functions. The level sets of Christoffel functions are known empirically to provide good approximations to the support of probability distributions: the algorithm uses this property for reachability analysis by solving a probabilistic relaxation of the reachable set computation problem. We also provide a guarantee that the output of the algorithm is an accurate reachable set approximation in a probabilistic sense, provided that a certain sample size is attained. We also investigate three numerical examples to demonstrate the algorithms capabilities, such as providing non-convex reachable set approximations and detecting holes in the reachable set.
In this paper we focus on the problem of assigning uncertainties to single-point predictions. We introduce a cost function that encodes the trade-off between accuracy and reliability in probabilistic forecast. We derive analytic formula for the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The Accuracy-Reliability cost function can be used to empirically estimate the variance in heteroskedastic regression problems (input dependent noise), by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should be both accurate and reliable (i.e. statistical consistent with observations). We show several examples with synthetic data, where the underlying hidden noise function can be accurately recovered, both in one and multi-dimensional problems. The practical implementation of the method has been done using a Neural Network and, in the one-dimensional case, with a simple polynomial fit.
Inspired by recent measurements with the CLAS detector at Jefferson Lab, we perform a self-consistent analysis of world data on the proton structure function g1 in the range 0.17 < Q2 < 30 (GeV/c)**2. We compute for the first time low-order moments of g1 and study their evolution from small to large values of Q2. The analysis includes the latest data on both the unpolarized inclusive cross sections and the ratio R = sigmaL / sigmaT from Jefferson Lab, as well as a new model for the transverse asymmetry A2 in the resonance region. The contributions of both leading and higher twists are extracted, taking into account effects from radiative corrections beyond the next-to-leading order by means of soft-gluon resummation techniques. The leading twist is determined with remarkably good accuracy and is compared with the predictions obtained using various polarized parton distribution sets available in the literature. The contribution of higher twists to the g1 moments is found to be significantly larger than in the case of the unpolarized structure function F2.
Many complex systems generate multifractal time series which are long-range cross-correlated. Numerous methods have been proposed to characterize the multifractal nature of these long-range cross correlations. However, several important issues about these methods are not well understood and most methods consider only one moment order. We study the joint multifractal analysis based on partition function with two moment orders, which was initially invented to investigate fluid fields, and derive analytically several important properties. We apply the method numerically to binomial measures with multifractal cross correlations and bivariate fractional Brownian motions without multifractal cross correlations. For binomial multifractal measures, the explicit expressions of mass function, singularity strength and multifractal spectrum of the cross correlations are derived, which agree excellently with the numerical results. We also apply the method to stock market indexes and unveil intriguing multifractality in the cross correlations of index volatilities.