No Arabic abstract
Aggregation of large databases in a specific format is a frequently used process to make the data easily manageable. Interval-valued data is one of the data types that is generated by such an aggregation process. Using traditional methods to analyze interval-valued data results in loss of information, and thus, several interval-valued data models have been proposed to gather reliable information from such data types. On the other hand, recent technological developments have led to high dimensional and complex data in many application areas, which may not be analyzed by traditional techniques. Functional data analysis is one of the most commonly used techniques to analyze such complex datasets. While the functional extensions of much traditional statistical techniques are available, the functional form of the interval-valued data has not been studied well. This paper introduces the functional forms of some well-known regression models that take interval-valued data. The proposed methods are based on the function-on-function regression model, where both the response and predictor/s are functional. Through several Monte Carlo simulations and empirical data analysis, the finite sample performance of the proposed methods is evaluated and compared with the state-of-the-art.
We propose an alternative to $k$-nearest neighbors for functional data whereby the approximating neighboring curves are piecewise functions built from a functional sample. Using a locally defined distance function that satisfies stabilization criteria, we establish pointwise and global approximation results in function spaces when the number of data curves is large enough. We exploit this feature to develop the asymptotic theory when a finite number of curves is observed at time-points given by an i.i.d. sample whose cardinality increases up to infinity. We use these results to investigate the problem of estimating unobserved segments of a partially observed functional data sample as well as to study the problem of functional classification and outlier detection. For such problems, our methods are competitive with and sometimes superior to benchmark predictions in the field.
Task-based functional magnetic resonance imaging (task fMRI) is a non-invasive technique that allows identifying brain regions whose activity changes when individuals are asked to perform a given task. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. Task fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to measurements of brain activity over each voxel of the brain along the duration of the experiment. In this context, we propose some visualization techniques for high dimensional functional data relying on depth-based notions that allow for computationally efficient 2-dim representations of tfMRI data and that shed light on sample composition, outlier presence and individual variability. We believe that this step is crucial previously to any inferential approach willing to identify neuroscientific patterns across individuals, tasks and brain regions. We illustrate the proposed technique through a simulation study and demonstrate its application on a motor and language task fMRI experiment.
During the last decades, many methods for the analysis of functional data including classification methods have been developed. Nonetheless, there are issues that have not been adressed satisfactorily by currently available methods, as, for example, feature selection combined with variable selection when using multiple functional covariates. In this paper, a functional ensemble is combined with a penalized and constrained multinomial logit model. It is shown that this synthesis yields a powerful classification tool for functional data (possibly mixed with non-functional predictors), which also provides automatic variable selection. The choice of an appropriate, sparsity-inducing penalty allows to estimate most model coefficients to exactly zero, and permits class-specific coefficients in multiclass problems, such that feature selection is obtained. An additional constraint within the multinomial logit model ensures that the model coefficients can be considered as weights. Thus, the estimation results become interpretable with respect to the discriminative importance of the selected features, which is rated by a feature importance measure. In two application examples, data of a cell chip used for water quality monitoring experiments and phoneme data used for speech recognition, the interpretability as well as the selection results are examined. The classification performance is compared to various other classification approaches which are in common use.
This paper considers the problem of variable selection in regression models in the case of functional variables that may be mixed with other type of variables (scalar, multivariate, directional, etc.). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model based on the use of distance correlation proposed by cite{Szekely2007}. For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, ...) of each variable. The algorithm has shown quite promising results when applied to simulations and real data sets.
Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. We introduce a notion of conditional independence between random functions, and construct a framework for Bayesian inference of undirected, decomposable graphs in the multivariate functional data context. This framework is based on extending Markov distributions and hyper Markov laws from random variables to random processes, providing a principled alternative to naive application of multivariate methods to discretized functional data. Markov properties facilitate the composition of likelihoods and priors according to the decomposition of a graph. Our focus is on Gaussian process graphical models using orthogonal basis expansions. We propose a hyper-inverse-Wishart-process prior for the covariance kernels of the infinite coefficient sequences of the basis expansion, establish existence, uniqueness, strong hyper Markov property, and conjugacy. Stochastic search Markov chain Monte Carlo algorithms are developed for posterior inference, assessed through simulations, and applied to a study of brain activity and alcoholism.