ﻻ يوجد ملخص باللغة العربية
Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by Efimov, Adamyan and Spokoiny (2019) where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling assumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in Efimov, Adamyan and Spokoiny (2019) is very limited and did not really address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypothesis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments.
We consider a problem of manifold estimation from noisy observations. Many manifold learning procedures locally approximate a manifold by a weighted average over a small neighborhood. However, in the presence of large noise, the assigned weights beco
Prediction for high dimensional time series is a challenging task due to the curse of dimensionality problem. Classical parametric models like ARIMA or VAR require strong modeling assumptions and time stationarity and are often overparametrized. This
In the context of computer code experiments, sensitivity analysis of a complicated input-output system is often performed by ranking the so-called Sobol indices. One reason of the popularity of Sobols approach relies on the simplicity of the statisti
We discuss parametric estimation of a degenerate diffusion system from time-discrete observations. The first component of the degenerate diffusion system has a parameter $theta_1$ in a non-degenerate diffusion coefficient and a parameter $theta_2$ in
We present a geometrical method for analyzing sequential estimating procedures. It is based on the design principle of the second-order efficient sequential estimation provided in Okamoto, Amari and Takeuchi (1991). By introducing a dual conformal cu