No Arabic abstract
This paper aims at improving the classification accuracy of a Support Vector Machine (SVM) classifier with Sequential Minimal Optimization (SMO) training algorithm in order to properly classify failure and normal instances from oil and gas equipment data. Recent applications of failure analysis have made use of the SVM technique without implementing SMO training algorithm, while in our study we show that the proposed solution can perform much better when using the SMO training algorithm. Furthermore, we implement the ensemble approach, which is a hybrid rule based and neural network classifier to improve the performance of the SVM classifier (with SMO training algorithm). The optimization study is as a result of the underperformance of the classifier when dealing with imbalanced dataset. The selected best performing classifiers are combined together with SVM classifier (with SMO training algorithm) by using the stacking ensemble method which is to create an efficient ensemble predictive model that can handle the issue of imbalanced data. The classification performance of this predictive model is considerably better than the SVM with and without SMO training algorithm and many other conventional classifiers.
We propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our algorithm simultaneously computes support vectors and a proxy kernel matrix used in forming the loss. This can be interpreted as a penalized kernel learning problem where indefinite kernel matrices are treated as a noisy observations of a true Mercer kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the projected gradient or analytic center cutting plane methods. We compare the performance of our technique with other methods on several classic data sets.
In this paper a data analytical approach featuring support vector machines (SVM) is employed to train a predictive model over an experimentaldataset, which consists of the most relevant studies for two-phase flow pattern prediction. The database for this study consists of flow patterns or flow regimes in gas-liquid two-phase flow. The term flow pattern refers to the geometrical configuration of the gas and liquid phases in the pipe. When gas and liquid flow simultaneously in a pipe, the two phases can distribute themselves in a variety of flow configurations. Gas-liquid two-phase flow occurs ubiquitously in various major industrial fields: petroleum, chemical, nuclear, and geothermal industries. The flow configurations differ from each other in the spatial distribution of the interface, resulting in different flow characteristics. Experimental results obtained by applying the presented methodology to different combinations of flow patterns demonstrate that the proposed approach is state-of-the-art alternatives by achieving 97% correct classification. The results suggest machine learning could be used as an effective tool for automatic detection and classification of gas-liquid flow patterns.
We propose several novel methods for enhancing the multi-class SVMs by applying the generalization performance of binary classifiers as the core idea. This concept will be applied on the existing algorithms, i.e., the Decision Directed Acyclic Graph (DDAG), the Adaptive Directed Acyclic Graphs (ADAG), and Max Wins. Although in the previous approaches there have been many attempts to use some information such as the margin size and the number of support vectors as performance estimators for binary SVMs, they may not accurately reflect the actual performance of the binary SVMs. We show that the generalization ability evaluated via a cross-validation mechanism is more suitable to directly extract the actual performance of binary SVMs. Our methods are built around this performance measure, and each of them is crafted to overcome the weakness of the previous algorithm. The proposed methods include the Reordering Adaptive Directed Acyclic Graph (RADAG), Strong Elimination of the classifiers (SE), Weak Elimination of the classifiers (WE), and Voting based Candidate Filtering (VCF). Experimental results demonstrate that our methods give significantly higher accuracy than all of the traditional ones. Especially, WE provides significantly superior results compared to Max Wins which is recognized as the state of the art algorithm in terms of both accuracy and classification speed with two times faster in average.
We propose $ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter of $ell_1$ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.
The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems, however, which is faced with some difficulties such as model selection and solving multi-classification problems quickly. This paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear with respect to the regularization parameters by combining the linear equations and block matrix theory. Eight kinds of events are defined to seek for the starting event and then the solution path algorithm is designed, which greatly reduces the computational cost. In addition, only few points are combined to complete the initialization and Lagrangian multipliers are proved to be 1 as the regularization parameter tends to infinity. Simulation results based on UCI datasets show that the proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level.