No Arabic abstract
During the last decades, many methods for the analysis of functional data including classification methods have been developed. Nonetheless, there are issues that have not been adressed satisfactorily by currently available methods, as, for example, feature selection combined with variable selection when using multiple functional covariates. In this paper, a functional ensemble is combined with a penalized and constrained multinomial logit model. It is shown that this synthesis yields a powerful classification tool for functional data (possibly mixed with non-functional predictors), which also provides automatic variable selection. The choice of an appropriate, sparsity-inducing penalty allows to estimate most model coefficients to exactly zero, and permits class-specific coefficients in multiclass problems, such that feature selection is obtained. An additional constraint within the multinomial logit model ensures that the model coefficients can be considered as weights. Thus, the estimation results become interpretable with respect to the discriminative importance of the selected features, which is rated by a feature importance measure. In two application examples, data of a cell chip used for water quality monitoring experiments and phoneme data used for speech recognition, the interpretability as well as the selection results are examined. The classification performance is compared to various other classification approaches which are in common use.
This article is concerned with the fitting of multinomial regression models using the so-called Poisson Trick. The work is motivated by Chen & Kuo (2001) and Malchow-M{o}ller & Svarer (2003) which have been criticized for being computationally inefficient and sometimes producing nonsense results. We first discuss the case of independent data and offer a parsimonious fitting strategy when all covariates are categorical. We then propose a new approach for modelling correlated responses based on an extension of the Gamma-Poisson model, where the likelihood can be expressed in closed-form. The parameters are estimated via an Expectation/Conditional Maximization (ECM) algorithm, which can be implemented using functions for fitting generalized linear models readily available in standard statistical software packages. Compared to existing methods, our approach avoids the need to approximate the intractable integrals and thus the inference is exact with respect to the approximating Gamma-Poisson model. The proposed method is illustrated via a reanalysis of the yogurt data discussed by Chen & Kuo (2001).
Aggregation of large databases in a specific format is a frequently used process to make the data easily manageable. Interval-valued data is one of the data types that is generated by such an aggregation process. Using traditional methods to analyze interval-valued data results in loss of information, and thus, several interval-valued data models have been proposed to gather reliable information from such data types. On the other hand, recent technological developments have led to high dimensional and complex data in many application areas, which may not be analyzed by traditional techniques. Functional data analysis is one of the most commonly used techniques to analyze such complex datasets. While the functional extensions of much traditional statistical techniques are available, the functional form of the interval-valued data has not been studied well. This paper introduces the functional forms of some well-known regression models that take interval-valued data. The proposed methods are based on the function-on-function regression model, where both the response and predictor/s are functional. Through several Monte Carlo simulations and empirical data analysis, the finite sample performance of the proposed methods is evaluated and compared with the state-of-the-art.
Multilevel or hierarchical data structures can occur in many areas of research, including economics, psychology, sociology, agriculture, medicine, and public health. Over the last 25 years, there has been increasing interest in developing suitable techniques for the statistical analysis of multilevel data, and this has resulted in a broad class of models known under the generic name of multilevel models. Generally, multilevel models are useful for exploring how relationships vary across higher-level units taking into account the within and between cluster variations. Research scientists often have substantive theories in mind when evaluating data with statistical models. Substantive theories often involve inequality constraints among the parameters to translate a theory into a model. This chapter shows how the inequality constrained multilevel linear model can be given a Bayesian formulation, how the model parameters can be estimated using a so-called augmented Gibbs sampler, and how posterior probabilities can be computed to assist the researcher in model selection.
We propose a new method for clustering of functional data using a $k$-means framework. We work within the elastic functional data analysis framework, which allows for decomposition of the overall variation in functional data into amplitude and phase components. We use the amplitude component to partition functions into shape clusters using an automated approach. To select an appropriate number of clusters, we additionally propose a novel Bayesian Information Criterion defined using a mixture model on principal components estimated using functional Principal Component Analysis. The proposed method is motivated by the problem of posterior exploration, wherein samples obtained from Markov chain Monte Carlo algorithms are naturally represented as functions. We evaluate our approach using a simulated dataset, and apply it to a study of acute respiratory infection dynamics in San Luis Potos{i}, Mexico.
We present a new functional Bayes classifier that uses principal component (PC) or partial least squares (PLS) scores from the common covariance function, that is, the covariance function marginalized over groups. When the groups have different covariance functions, the PC or PLS scores need not be independent or even uncorrelated. We use copulas to model the dependence. Our method is semiparametric; the marginal densities are estimated nonparametrically by kernel smoothing and the copula is modeled parametrically. We focus on Gaussian and t-copulas, but other copulas could be used. The strong performance of our methodology is demonstrated through simulation, real data examples, and asymptotic properties.