No Arabic abstract
Motivated by the analysis of high-dimensional neuroimaging signals located over the cortical surface, we introduce a novel Principal Component Analysis technique that can handle functional data located over a two-dimensional manifold. For this purpose a regularization approach is adopted, introducing a smoothing penalty coherent with the geodesic distance over the manifold. The model introduced can be applied to any manifold topology, can naturally handle missing data and functional samples evaluated in different grids of points. We approach the discretization task by means of finite element analysis and propose an efficient iterative algorithm for its resolution. We compare the performances of the proposed algorithm with other approaches classically adopted in literature. We finally apply the proposed method to resting state functional magnetic resonance imaging data from the Human Connectome Project, where the method shows substantial differential variations between brain regions that were not apparent with other approaches.
High dimensional data has introduced challenges that are difficult to address when attempting to implement classical approaches of statistical process control. This has made it a topic of interest for research due in recent years. However, in many cases, data sets have underlying structures, such as in advanced manufacturing systems. If extracted correctly, efficient methods for process control can be developed. This paper proposes a robust sparse dimensionality reduction approach for correlated high-dimensional process monitoring to address the aforementioned issues. The developed monitoring technique uses robust sparse probabilistic PCA to reduce the dimensionality of the data stream while retaining interpretability. The proposed methodology utilizes Bayesian variational inference to obtain the estimates of a probabilistic representation of PCA. Simulation studies were conducted to verify the efficacy of the proposed methodology. Furthermore, we conducted a case study for change detection for in-line Raman spectroscopy to validate the efficiency of our proposed method in a practical scenario.
There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure.
Positron Emission Tomography (PET) is an imaging technique which can be used to investigate chemical changes in human biological processes such as cancer development or neurochemical reactions. Most dynamic PET scans are currently analyzed based on the assumption that linear first order kinetics can be used to adequately describe the system under observation. However, there has recently been strong evidence that this is not the case. In order to provide an analysis of PET data which is free from this compartmental assumption, we propose a nonparametric deconvolution and analysis model for dynamic PET data based on functional principal component analysis. This yields flexibility in the possible deconvolved functions while still performing well when a linear compartmental model setup is the true data generating mechanism. As the deconvolution needs to be performed on only a relative small number of basis functions rather than voxel by voxel in the entire 3-D volume, the methodology is both robust to typical brain imaging noise levels while also being computationally efficient. The new methodology is investigated through simulations in both 1-D functions and 2-D images and also applied to a neuroimaging study whose goal is the quantification of opioid receptor concentration in the brain.
The association between a persons physical activity and various health outcomes is an area of active research. The National Health and Nutrition Examination Survey (NHANES) data provide a valuable resource for studying these associations. NHANES accelerometry data has been used by many to measure individuals activity levels. A common approach for analyzing accelerometry data is functional principal component analysis (FPCA). The first part of the paper uses Poisson FPCA (PFPCA), Gaussian FPCA (GFPCA), and nonnegative and regularized function decomposition (NARFD) to extract features from the count-valued NHANES accelerometry data. The second part of the paper compares logistic regression, random forests, and AdaBoost models based on GFPCA, NARFD, or PFPCA scores in the context of mortality prediction. The results show that Poisson FPCA is the best FPCA model for the inference of accelerometry data, and the AdaBoost model based on Poisson FPCA scores gives the best mortality prediction results.
We develop a new methodology for spatial regression of aggregated outputs on multi-resolution covariates. Such problems often occur with spatial data, for example in crop yield prediction, where the output is spatially-aggregated over an area and the covariates may be observed at multiple resolutions. Building upon previous work on aggregated output regression, we propose a regression framework to synthesise the effects of the covariates at different resolutions on the output and provide uncertainty estimation. We show that, for a crop yield prediction problem, our approach is more scalable, via variational inference, than existing multi-resolution regression models. We also show that our framework yields good predictive performance, compared to existing multi-resolution crop yield models, whilst being able to provide estimation of the underlying spatial effects.