ﻻ يوجد ملخص باللغة العربية
Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are huge single spikes in the eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues which leads to a much improved SigClust. These major improvements in SigClust performance are shown by both theoretical work and an extensive simulation study. Applications to some cancer genomic data further demonstrate the usefulness of these improvements.
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past, and apply these methods in two different contexts. A new version of the algorithm is developed which is feasible for processes whi
We introduce a new method of Bayesian wavelet shrinkage for reconstructing a signal when we observe a noisy version. Rather than making the common assumption that the wavelet coefficients of the signal are independent, we allow for the possibility th
Mediation analyses are a statistical tool for testing the hypothesis about how the relationship between two variables may be direct or indirect via a third variable. Assessing statistical significance has been an area of active research; however, ass
Functional variables are often used as predictors in regression problems. A commonly-used parametric approach, called {it scalar-on-function regression}, uses the $ltwo$ inner product to map functional predictors into scalar responses. This method ca
Functional data registration is a necessary processing step for many applications. The observed data can be inherently noisy, often due to measurement error or natural process uncertainty, which most functional alignment methods cannot handle. A pair