No Arabic abstract
Sparse classifiers such as the support vector machines (SVM) are efficient in test-phases because the classifier is characterized only by a subset of the samples called support vectors (SVs), and the rest of the samples (non SVs) have no influence on the classification result. However, the advantage of the sparsity has not been fully exploited in training phases because it is generally difficult to know which sample turns out to be SV beforehand. In this paper, we introduce a new approach called safe sample screening that enables us to identify a subset of the non-SVs and screen them out prior to the training phase. Our approach is different from existing heuristic approaches in the sense that the screened samples are guaranteed to be non-SVs at the optimal solution. We investigate the advantage of the safe sample screening approach through intensive numerical experiments, and demonstrate that it can substantially decrease the computational cost of the state-of-the-art SVM solvers such as LIBSVM. In the current big data era, we believe that safe sample screening would be of great practical importance since the data size can be reduced without sacrificing the optimality of the final solution.
Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve, such as a weighted support vector machine (SVM), are desirable because they are robust to model misspecification. While weighted SVMs have great potential for estimating ROC curves, their theoretical properties were heretofore underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method and the superior sensitivity and specificity of the weighted SVM compared to commonly used methods in diagnostic medicine using simulation studies. We present two illustrative examples: diagnosis of hepatitis C and a predictive model for treatment response in breast cancer.
A rapid pattern-recognition approach to characterize drivers curve-negotiating behavior is proposed. To shorten the recognition time and improve the recognition of driving styles, a k-means clustering-based support vector machine ( kMC-SVM) method is developed and used for classifying drivers into two types: aggressive and moderate. First, vehicle speed and throttle opening are treated as the feature parameters to reflect the driving styles. Second, to discriminate driver curve-negotiating behaviors and reduce the number of support vectors, the k-means clustering method is used to extract and gather the two types of driving data and shorten the recognition time. Then, based on the clustering results, a support vector machine approach is utilized to generate the hyperplane for judging and predicting to which types the human driver are subject. Lastly, to verify the validity of the kMC-SVM method, a cross-validation experiment is designed and conducted. The research results show that the $ k $MC-SVM is an effective method to classify driving styles with a short time, compared with SVM method.
A widely-used tool for binary classification is the Support Vector Machine (SVM), a supervised learning technique that finds the maximum margin linear separator between the two classes. While SVMs have been well studied in the batch (offline) setting, there is considerably less work on the streaming (online) setting, which requires only a single pass over the data using sub-linear space. Existing streaming algorithms are not yet competitive with the batch implementation. In this paper, we use the formulation of the SVM as a minimum enclosing ball (MEB) problem to provide a streaming SVM algorithm based off of the blurred ball cover originally proposed by Agarwal and Sharathkumar. Our implementation consistently outperforms existing streaming SVM approaches and provides higher accuracies than libSVM on several datasets, thus making it competitive with the standard SVM batch implementation.
We develop a machine learning framework that can be applied to data sets derived from the trajectories of Hamiltons equations. The goal is to learn the phase space structures that play the governing role for phase space transport relevant to particular applications. Our focus is on learning reactive islands in two degrees-of-freedom Hamiltonian systems. Reactive islands are constructed from the stable and unstable manifolds of unstable periodic orbits and play the role of quantifying transition dynamics. We show that support vector machines (SVM) is an appropriate machine learning framework for this purpose as it provides an approach for finding the boundaries between qualitatively distinct dynamical behaviors, which is in the spirit of the phase space transport framework. We show how our method allows us to find reactive islands directly in the sense that we do not have to first compute unstable periodic orbits and their stable and unstable manifolds. We apply our approach to the Henon-Heiles Hamiltonian system, which is a benchmark system in the dynamical systems community. We discuss different sampling and learning approaches and their advantages and disadvantages.
In this paper we solve support vector machines in reproducing kernel Banach spaces with reproducing kernels defined on nonsymmetric domains instead of the traditional methods in reproducing kernel Hilbert spaces. Using the orthogonality of semi-inner-products, we can obtain the explicit representations of the dual (normalized-duality-mapping) elements of support vector machine solutions. In addition, we can introduce the reproduction property in a generalized native space by Fourier transform techniques such that it becomes a reproducing kernel Banach space, which can be even embedded into Sobolev spaces, and its reproducing kernel is set up by the related positive definite function. The representations of the optimal solutions of support vector machines (regularized empirical risks) in these reproducing kernel Banach spaces are formulated explicitly in terms of positive definite functions, and their finite numbers of coefficients can be computed by fixed point iteration. We also give some typical examples of reproducing kernel Banach spaces induced by Matern functions (Sobolev splines) so that their support vector machine solutions are well computable as the classical algorithms. Moreover, each of their reproducing bases includes information from multiple training data points. The concept of reproducing kernel Banach spaces offers us a new numerical tool for solving support vector machines.