No Arabic abstract
Medical imaging studies have collected high dimensional imaging data to identify imaging biomarkers for diagnosis, screening, and prognosis, among many others. These imaging data are often represented in the form of a multi-dimensional array, called a tensor. The aim of this paper is to develop a tensor partition regression modeling (TPRM) framework to establish a relationship between low-dimensional clinical outcomes (e.g., diagnosis) and high dimensional tensor covariates. Our TPRM is a hierarchical model and efficiently integrates four components: (i) a partition model, (ii) a canonical polyadic decomposition model, (iii) a principal components model, and (iv) a generalized linear model with a sparse inducing normal mixture prior. This framework not only reduces ultra-high dimensionality to a manageable level, resulting in efficient estimation, but also optimizes prediction accuracy in the search for informative sub-tensors. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulation shows that TPRM outperforms several other competing methods. We apply TPRM to predict disease status (Alzheimer versus control) by using structural magnetic resonance imaging data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) study.
We propose an efficient statistical method (denoted as SSR-Tensor) to robustly and quickly detect hot-spots that are sparse and temporal-consistent in a spatial-temporal dataset through the tensor decomposition. Our main idea is first to build an SSR model to decompose the tensor data into a Smooth global trend mean, Sparse local hot-spots, and Residuals. Next, tensor decomposition is utilized as follows: bases are introduced to describe within-dimension correlation, and tensor products are used for between-dimension interaction. Then, a combination of LASSO and fused LASSO is used to estimate the model parameters, where an efficient recursive estimation procedure is developed based on the large-scale convex optimization, where we first transform the general LASSO optimization into regular LASSO optimization and apply FISTA to solve it with the fastest convergence rate. Finally, a CUSUM procedure is applied to detect when and where the hot-spot event occurs. We compare the performance of the proposed method in a numerical simulation study and a real-world case study, which contains a dataset including a collection of three types of crime rates for U.S. mainland states during the year 1965-2014. In both cases, the proposed SSR-Tensor is able to achieve the fast detection and accurate localization of the hot-spots.
This paper proposes a new methodology to predict and update the residual useful lifetime of a system using a sequence of degradation images. The methodology integrates tensor linear algebra with traditional location-scale regression widely used in reliability and prognosis. To address the high dimensionality challenge, the degradation image streams are first projected to a low-dimensional tensor subspace that is able to preserve their information. Next, the projected image tensors are regressed against time-to-failure via penalized location-scale tensor regression. The coefficient tensor is then decomposed using CANDECOMP/PARAFAC (CP) and Tucker decompositions, which enables parameter estimation in a high-dimensional setting. Two optimization algorithms with a global convergence property are developed for model estimation. The effectiveness of our models is validated using a simulated dataset and infrared degradation image streams from a rotating machinery.
The aim of this paper is to develop a class of spatial transformation models (STM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) and a set of covariates. Our STMs include a varying Box-Cox transformation model for dealing with the issue of non-Gaussian distributed imaging data and a Gaussian Markov Random Field model for incorporating spatial smoothness of the imaging data. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations and real data analysis demonstrate that the STM significantly outperforms the voxel-wise linear model with Gaussian noise in recovering meaningful geometric patterns. Our STM is able to reveal important brain regions with morphological changes in children with attention deficit hyperactivity disorder.
Change-point detection regains much attention recently for analyzing array or sequencing data for copy number variation (CNV) detection. In such applications, the true signals are typically very short and buried in the long data sequence, which makes it challenging to identify the variations efficiently and accurately. In this article, we propose a new change-point detection method, a backward procedure, which is not only fast and simple enough to exploit high-dimensional data but also performs very well for detecting short signals. Although motivated by CNV detection, the backward procedure is generally applicable to assorted change-point problems that arise in a variety of scientific applications. It is illustrated by both simulated and real CNV data that the backward detection has clear advantages over other competing methods especially when the true signal is short.
Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with a simple submodel. The success of this approach highly depends on the quality of the partition, as too large a region could lead to a non-simple submodel, while too small a region could inflate estimation variance. This paper proposes an automatic procedure for choosing the partition (i.e., the number of regions and the boundaries between regions) as well as the submodels for the regions. It is shown that, under the assumption of the existence of a true partition, the proposed partition estimator is statistically consistent. The methodology is demonstrated for both regression and classification problems.