ﻻ يوجد ملخص باللغة العربية
When analyzing empirical data, we often find that global linear models overestimate the number of parameters required. In such cases, we may ask whether the data lies on or near a manifold or a set of manifolds (a so-called multi-manifold) of lower dimension than the ambient space. This question can be phrased as a (multi-) manifold hypothesis. The identification of such intrinsic multiscale features is a cornerstone of data analysis and representation and has given rise to a large body of work on manifold learning. In this work, we review key results on multi-scale data analysis and intrinsic dimension followed by the introduction of a heuristic, multiscale framework for testing the multi-manifold hypothesis. Our method implements a hypothesis test on a set of spline-interpolated manifolds constructed from variance-based intrinsic dimensions. The workflow is suitable for empirical data analysis as we demonstrate on two use cases.
Distance-based tests, also called energy statistics, are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from kernel mean embeddings, are leading methods for two-sample and independen
We present a novel framework exploiting the cascade of phase transitions occurring during a simulated annealing of the Expectation-Maximisation algorithm to cluster datasets with multi-scale structures. Using the weighted local covariance, we can ext
Messenger advertisements (ads) give direct and personal user experience yielding high conversion rates and sales. However, people are skeptical about ads and sometimes perceive them as spam, which eventually leads to a decrease in user satisfaction.
Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An e