ترغب بنشر مسار تعليمي؟ اضغط هنا

Multidimensional Scaling of Noisy High Dimensional Data

206   0   0.0 ( 0 )
 نشر من قبل Erez Peterfreund
 تاريخ النشر 2018
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. Originally introduced in the 1950s, MDS was not designed with high-dimensional data in mind; while it remains popular with data analysis practitioners, no doubt it should be adapted to the high-dimensional data regime. In this paper we study MDS under modern setting, and specifically, high dimensions and ambient measurement noise. We show that, as the ambient noise level increase, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, an extremely simple variant of MDS, which applies a carefully derived shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a loss function measuring the embedding quality, MDS+ is the unique asymptotically optimal shrinkage function. We prove that MDS+ offers improved embedding, sometimes significantly so, compared with classical MDS. Furthermore, MDS+ does not require external estimates of the embedding dimension (a famous difficulty in classical MDS), as it calculates the optimal dimension into which the data should be embedded.



قيم البحث

اقرأ أيضاً

66 - Lara Kassab 2019
Multidimensional scaling (MDS) is a popular technique for mapping a finite metric space into a low-dimensional Euclidean space in a way that best preserves pairwise distances. We study a notion of MDS on infinite metric measure spaces, along with its optimality properties and goodness of fit. This allows us to study the MDS embeddings of the geodesic circle $S^1$ into $mathbb{R}^m$ for all $m$, and to ask questions about the MDS embeddings of the geodesic $n$-spheres $S^n$ into $mathbb{R}^m$. Furthermore, we address questions on convergence of MDS. For instance, if a sequence of metric measure spaces converges to a fixed metric measure space $X$, then in what sense do the MDS embeddings of these spaces converge to the MDS embedding of $X$? Convergence is understood when each metric space in the sequence has the same finite number of points, or when each metric space has a finite number of points tending to infinity. We are also interested in notions of convergence when each metric space in the sequence has an arbitrary (possibly infinite) number of points.
Data observed at high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or a smooth random function, and measurement error. Supposing that the late nt component is an It^{o} diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and a real data analysis validate our analysis.
A new vision in multidimensional statistics is proposed impacting severalareas of application. In these applications, a set of noisy measurementscharacterizing the repeatable response of a process is known as a realizationand can be seen as a single point in $mathbb{R}^N$. The projections of thispoint on the N axes correspond to the N measurements. The contemporary visionof a diffuse cloud of realizations distributed in $mathbb{R}^N$ is replaced bya cloud in the shape of a shell surrounding a topological manifold. Thismanifold corresponds to the processs stabilized-response domain observedwithout the measurement noise. The measurement noise, which accumulates overseveral dimensions, distances each realization from the manifold. Theprobability density function (PDF) of the realization-to-manifold distancecreates the shell. Considering the central limit theorem as the number ofdimensions increases, the PDF tends toward the normal distribution N($mu$,$sigma$^2) where $mu$ fixes the center shell location and $sigma$fixes the shell thickness. In vision, the likelihood of a realization is afunction of the realization-to-shell distance rather than therealization-to-manifold distance. The demonstration begins with the work ofClaude Shannon followed by the introduction of the shell manifold and ends withpractical applications to monitoring equipment.
130 - Emmanuel Pilliat 2020
This manuscript makes two contributions to the field of change-point detection. In a general change-point setting, we provide a generic algorithm for aggregating local homogeneity tests into an estimator of change-points in a time series. Interesting ly, we establish that the error rates of the collection of test directly translate into detection properties of the change-point estimator. This generic scheme is then applied to the problem of possibly sparse multivariate mean change-point detection setting. When the noise is Gaussian, we derive minimax optimal rates that are adaptive to the unknown sparsity and to the distance between change-points. For sub-Gaussian noise, we introduce a variant that is optimal in almost all sparsity regimes.
This paper considers the maximum generalized empirical likelihood (GEL) estimation and inference on parameters identified by high dimensional moment restrictions with weakly dependent data when the dimensions of the moment restrictions and the parame ters diverge along with the sample size. The consistency with rates and the asymptotic normality of the GEL estimator are obtained by properly restricting the growth rates of the dimensions of the parameters and the moment restrictions, as well as the degree of data dependence. It is shown that even in the high dimensional time series setting, the GEL ratio can still behave like a chi-square random variable asymptotically. A consistent test for the over-identification is proposed. A penalized GEL method is also provided for estimation under sparsity setting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا