Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Noisy data clusters are hollow

531 0 0.0 ( 0 )

Download Cite

Added by Francois Leonard

Publication date 2015

fields Mathematical Statistics

and research's language is English

Authors Franc{c}ois Leonard

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A new vision in multidimensional statistics is proposed impacting severalareas of application. In these applications, a set of noisy measurementscharacterizing the repeatable response of a process is known as a realizationand can be seen as a single point in $mathbb{R}^N$. The projections of thispoint on the N axes correspond to the N measurements. The contemporary visionof a diffuse cloud of realizations distributed in $mathbb{R}^N$ is replaced bya cloud in the shape of a shell surrounding a topological manifold. Thismanifold corresponds to the processs stabilized-response domain observedwithout the measurement noise. The measurement noise, which accumulates overseveral dimensions, distances each realization from the manifold. Theprobability density function (PDF) of the realization-to-manifold distancecreates the shell. Considering the central limit theorem as the number ofdimensions increases, the PDF tends toward the normal distribution N($mu$,$sigma$^2) where $mu$ fixes the center shell location and $sigma$fixes the shell thickness. In vision, the likelihood of a realization is afunction of the realization-to-shell distance rather than therealization-to-manifold distance. The demonstration begins with the work ofClaude Shannon followed by the introduction of the shell manifold and ends withpractical applications to monitoring equipment.

rate research

Multidimensional Scaling of Noisy High Dimensional Data

205 - Erez Peterfreund , Matan Gavish 2018

Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. Originally introduced in the 1950s, MDS was not designed with high-dimensional data in mind; while it remains popular with data analysis practitioners, no doubt it should be adapted to the high-dimensional data regime. In this paper we study MDS under modern setting, and specifically, high dimensions and ambient measurement noise. We show that, as the ambient noise level increase, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, an extremely simple variant of MDS, which applies a carefully derived shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a loss function measuring the embedding quality, MDS+ is the unique asymptotically optimal shrinkage function. We prove that MDS+ offers improved embedding, sometimes significantly so, compared with classical MDS. Furthermore, MDS+ does not require external estimates of the embedding dimension (a famous difficulty in classical MDS), as it calculates the optimal dimension into which the data should be embedded.

Statistics Theory Statistics Theory

Inference for Dependent Data with Learned Clusters

114 - Jianfei Cao , Christian Hansen , Damian Kozbur 2021

This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized by a known, observed dissimilarity measure over spatial indices. Observations are partitioned into clusters with the use of an unsupervised clustering algorithm applied to the dissimilarity measure. Once the partition into clusters is learned, a cluster-based inference procedure is applied to a statistical hypothesis testing procedure. The procedure proposed in the paper allows the number of clusters to depend on the data, which gives researchers a principled method for choosing an appropriate clustering level. The paper gives conditions under which the proposed procedure asymptotically attains correct size. A simulation study shows that the proposed procedure attains near nominal size in finite samples in a variety of statistical testing problems with dependent data.

Statistics Theory Statistics Theory

A frequency domain analysis of the error distribution from noisy high-frequency data

211 - Jinyuan Chang , Aurore Delaigle , Peter Hall 2018

Data observed at high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or a smooth random function, and measurement error. Supposing that the latent component is an It^{o} diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and a real data analysis validate our analysis.

Statistics Theory Statistics Theory

Inference for partial correlation when data are missing not at random

80 - Tetiana Gorbach , Xavier de Luna 2017

We introduce uncertainty regions to perform inference on partial correlations when data are missing not at random. These uncertainty regions are shown to have a desired asymptotic coverage. Their finite sample performance is illustrated via simulations and real data example.

Statistics Theory Statistics Theory

Thresholds For Detecting An Anomalous Path From Noisy Environments

91 - Shirshendu Chatterjee , Ofer Zeitouni 2017

We consider the searching for a trail in a maze composite hypothesis testing problem, in which one attempts to detect an anomalous directed path in a lattice 2D box of side n based on observations on the nodes of the box. Under the signal hypothesis, one observes independent Gaussian variables of unit variance at all nodes, with zero, mean off the anomalous path and mean mu_n on it. Under the null hypothesis, one observes i.i.d. standard Gaussians on all nodes. Arias-Castro et al. (2008) showed that if the unknown directed path under the signal hypothesis has known the initial location, then detection is possible (in the minimax sense) if mu_n >> 1/sqrt log n, while it is not possible if mu_n << 1/ log nsqrt log log n. In this paper, we show that this result continues to hold even when the initial location of the unknown path is not known. As is the case with Arias-Castro et al. (2008), the upper bound here also applies when the path is undirected. The improvement is achieved by replacing the linear detection statistic used in Arias-Castro et al. (2008) with a polynomial statistic, which is obtained by employing a multi-scale analysis on a quadratic statistic to bootstrap its performance. Our analysis is motivated by ideas developed in the context of the analysis of random polymers in Lacoin (2010).

Statistics Theory Statistics Theory

comments

Fetching comments

Syrian Virtual University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Noisy data clusters are hollow

Ask ChatGPT about the research

No Arabic abstract

Read More