ﻻ يوجد ملخص باللغة العربية
In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets.
We present the design and implementation of a custom discrete optimization technique for building rule lists over a categorical feature space. Our algorithm produces rule lists with optimal training performance, according to the regularized empirical
The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/s
Popular machine learning estimators involve regularization parameters that can be challenging to tune, and standard strategies rely on grid search for this task. In this paper, we revisit the techniques of approximating the regularization path up to
We investigate implicit regularization schemes for gradient descent methods applied to unpenalized least squares regression to solve the problem of reconstructing a sparse signal from an underdetermined system of linear measurements under the restric
Sparse classifiers such as the support vector machines (SVM) are efficient in test-phases because the classifier is characterized only by a subset of the samples called support vectors (SVs), and the rest of the samples (non SVs) have no influence on