No Arabic abstract
In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the estimation of function-valued functions. In this setting, the representer theorem is restated, a set of rigorously defined infinite-dimensional operator-valued kernels that can be valuably applied when the data are functions is described, and a learning algorithm for nonlinear functional data analysis is introduced. The methodology is illustrated through speech and audio signal processing experiments.
Although operator-valued kernels have recently received increasing interest in various machine learning and functional data analysis problems such as multi-task learning or functional regression, little attention has been paid to the understanding of their associated feature spaces. In this paper, we explore the potential of adopting an operator-valued kernel feature space perspective for the analysis of functional data. We then extend the Regularized Least Squares Classification (RLSC) algorithm to cover situations where there are multiple functions per observation. Experiments on a sound recognition problem show that the proposed method outperforms the classical RLSC algorithm.
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a family of fast, flexible, lightly parametrized and general purpose kernel learning methods, derived from Fastfood basis function expansions. We provide mechanisms to learn the properties of groups of spectral frequencies in these expansions, which require only O(mlogd) time and O(m) memory, for m basis functions and d input dimensions. We show that the proposed methods can learn a wide class of kernels, outperforming the alternatives in accuracy, speed, and memory consumption.
Aggregation of large databases in a specific format is a frequently used process to make the data easily manageable. Interval-valued data is one of the data types that is generated by such an aggregation process. Using traditional methods to analyze interval-valued data results in loss of information, and thus, several interval-valued data models have been proposed to gather reliable information from such data types. On the other hand, recent technological developments have led to high dimensional and complex data in many application areas, which may not be analyzed by traditional techniques. Functional data analysis is one of the most commonly used techniques to analyze such complex datasets. While the functional extensions of much traditional statistical techniques are available, the functional form of the interval-valued data has not been studied well. This paper introduces the functional forms of some well-known regression models that take interval-valued data. The proposed methods are based on the function-on-function regression model, where both the response and predictor/s are functional. Through several Monte Carlo simulations and empirical data analysis, the finite sample performance of the proposed methods is evaluated and compared with the state-of-the-art.
The James-Stein (JS) shrinkage estimator is a biased estimator that captures the mean of Gaussian random vectors.While it has a desirable statistical property of dominance over the maximum likelihood estimator (MLE) in terms of mean squared error (MSE), not much progress has been made on extending the estimator onto manifold-valued data. We propose C-SURE, a novel Steins unbiased risk estimate (SURE) of the JS estimator on the manifold of complex-valued data with a theoretically proven optimum over MLE. Adapting the architecture of the complex-valued SurReal classifier, we further incorporate C-SURE into a prototype convolutional neural network (CNN) classifier. We compare C-SURE with SurReal and a real-valued baseline on complex-valued MSTAR and RadioML datasets. C-SURE is more accurate and robust than SurReal, and the shrinkage estimator is always better than MLE for the same prototype classifier. Like SurReal, C-SURE is much smaller, outperforming the real-valued baseline on MSTAR (RadioML) with less than 1 percent (3 percent) of the baseline size