No Arabic abstract
We present a new functional Bayes classifier that uses principal component (PC) or partial least squares (PLS) scores from the common covariance function, that is, the covariance function marginalized over groups. When the groups have different covariance functions, the PC or PLS scores need not be independent or even uncorrelated. We use copulas to model the dependence. Our method is semiparametric; the marginal densities are estimated nonparametrically by kernel smoothing and the copula is modeled parametrically. We focus on Gaussian and t-copulas, but other copulas could be used. The strong performance of our methodology is demonstrated through simulation, real data examples, and asymptotic properties.
A partial least squares regression is proposed for estimating the function-on-function regression model where a functional response and multiple functional predictors consist of random curves with quadratic and interaction effects. The direct estimation of a function-on-function regression model is usually an ill-posed problem. To overcome this difficulty, in practice, the functional data that belong to the infinite-dimensional space are generally projected into a finite-dimensional space of basis functions. The function-on-function regression model is converted to a multivariate regression model of the basis expansion coefficients. In the estimation phase of the proposed method, the functional variables are approximated by a finite-dimensional basis function expansion method. We show that the partial least squares regression constructed via a functional response, multiple functional predictors, and quadratic/interaction terms of the functional predictors is equivalent to the partial least squares regression constructed using basis expansions of functional variables. From the partial least squares regression of the basis expansions of functional variables, we provide an explicit formula for the partial least squares estimate of the coefficient function of the function-on-function regression model. Because the true forms of the models are generally unspecified, we propose a forward procedure for model selection. The finite sample performance of the proposed method is examined using several Monte Carlo experiments and two empirical data analyses, and the results were found to compare favorably with an existing method.
Forecasting defect proneness of source code has long been a major research concern. Having an estimation of those parts of a software system that most likely contain bugs may help focus testing efforts, reduce costs, and improve product quality. Many prediction models and approaches have been introduced during the past decades that try to forecast bugged code elements based on static source code metrics, change and history metrics, or both. However, there is still no universal best solution to this problem, as most suitable features and models vary from dataset to dataset and depend on the context in which we use them. Therefore, novel approaches and further studies on this topic are highly necessary. In this paper, we employ a chemometric approach - Partial Least Squares with Discriminant Analysis (PLS-DA) - for predicting bug prone Classes in Java programs using static source code metrics. To our best knowledge, PLS-DA has never been used before as a statistical approach in the software maintenance domain for predicting software errors. In addition, we have used rigorous statistical treatments including bootstrap resampling and randomization (permutation) test, and evaluation for representing the software engineering results. We show that our PLS-DA based prediction model achieves superior performances compared to the state-of-the-art approaches (i.e. F-measure of 0.44-0.47 at 90% confidence level) when no data re-sampling applied and comparable to others when applying up-sampling on the largest open bug dataset, while training the model is significantly faster, thus finding optimal parameters is much easier. In terms of completeness, which measures the amount of bugs contained in the Java Classes predicted to be defective, PLS-DA outperforms every other algorithm: it found 69.3% and 79.4% of the total bugs with no re-sampling and up-sampling, respectively.
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well in the presence of outliers. To address this challenge, a new robust functional principal component analysis approach based on the functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced where we propose estimation procedures for both eigenfunctions and eigenvalues with and without measurement error. Compared to existing robust FPCA methods, the proposed one requires weaker distributional assumptions to conserve the eigenspace of the covariance function. In particular, a class of distributions called the weakly functional coordinate symmetric (weakly FCS) is introduced that allows for severe asymmetry and is strictly larger than the functional elliptical distribution class, the latter of which has been well used in the robust statistics literature. The robustness of the PASS FPCA is demonstrated via simulation studies and analyses of accelerometry data from a large-scale epidemiological study of physical activity on older women that partly motivates this work.
Functional principal components analysis is a popular tool for inference on functional data. Standard approaches rely on an eigendecomposition of a smoothed covariance surface in order to extract the orthonormal functions representing the major modes of variation. This approach can be a computationally intensive procedure, especially in the presence of large datasets with irregular observations. In this article, we develop a Bayesian approach, which aims to determine the Karhunen-Lo`eve decomposition directly without the need to smooth and estimate a covariance surface. More specifically, we develop a variational Bayesian algorithm via message passing over a factor graph, which is more commonly referred to as variational message passing. Message passing algorithms are a powerful tool for compartmentalizing the algebra and coding required for inference in hierarchical statistical models. Recently, there has been much focus on formulating variational inference algorithms in the message passing framework because it removes the need for rederiving approximate posterior density functions if there is a change to the model. Instead, model changes are handled by changing specific computational units, known as fragments, within the factor graph. We extend the notion of variational message passing to functional principal components analysis. Indeed, this is the first article to address a functional data model via variational message passing. Our approach introduces two new fragments that are necessary for Bayesian functional principal components analysis. We present the computational details, a set of simulations for assessing accuracy and speed and an application to United States temperature data.
This paper proposes the capped least squares regression with an adaptive resistance parameter, hence the name, adaptive capped least squares regression. The key observation is, by taking the resistant parameter to be data dependent, the proposed estimator achieves full asymptotic efficiency without losing the resistance property: it achieves the maximum breakdown point asymptotically. Computationally, we formulate the proposed regression problem as a quadratic mixed integer programming problem, which becomes computationally expensive when the sample size gets large. The data-dependent resistant parameter, however, makes the loss function more convex-like for larger-scale problems. This makes a fast randomly initialized gradient descent algorithm possible for global optimization. Numerical examples indicate the superiority of the proposed estimator compared with classical methods. Three data applications to cancer cell lines, stationary background recovery in video surveillance, and blind image inpainting showcase its broad applicability.