Decision trees as partitioning machines to characterize their generalization properties

208 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jean-Samuel Leboeuf

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jean-Samuel Leboeuf - Frederic LeBlanc - Mario Marchand

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer $d$ such that $2ell ge binom{d}{leftlfloorfrac{d}{2}rightrfloor}$, where $ell$ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$. Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.

قيم البحث

122 - Pittipol Kantavat , Boonserm Kijsirikul , Patoomsiri Songsiri 2017

We propose new methods for Support Vector Machines (SVMs) using tree architecture for multi-class classi- fication. In each node of the tree, we select an appropriate binary classifier using entropy and generalization error estimation, then group the examples into positive and negative classes based on the selected classi- fier and train a new classifier for use in the classification phase. The proposed methods can work in time complexity between O(log2N) to O(N) where N is the number of classes. We compared the performance of our proposed methods to the traditional techniques on the UCI machine learning repository using 10-fold cross-validation. The experimental results show that our proposed methods are very useful for the problems that need fast classification time or problems with a large number of classes as the proposed methods run much faster than the traditional techniques but still provide comparable accuracy.

التعلم الآلي التعلم الالي

Learning stochastic decision trees

67 - Guy Blanc , Jane Lange , Li-Yang Tan 2021

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $eta$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm run s in time $n^{O(log(s/varepsilon)/varepsilon^2)}$ and returns a hypothesis with error within an additive $2eta + varepsilon$ of the Bayes optimal. An additive $2eta$ is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of $O(eta) + varepsilon$ was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Multi-Layered Gradient Boosting Decision Trees

89 - Ji Feng , Yang Yu , Zhi-Hua Zhou 2018

Multi-layered representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. While non-differentiable models such as gradient boosting decision trees (GBDTs) are the dominant methods fo r modeling discrete or tabular data, they are hard to incorporate with such representation learning ability. In this work, we propose the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block. The model can be jointly trained by a variant of target propagation across layers, without the need to derive back-propagation nor differentiability. Experiments and visualizations confirmed the effectiveness of the model in terms of performance and representation learning ability.

التعلم الآلي التعلم الالي

Generalized and Scalable Optimal Sparse Decision Trees

72 - Jimmy Lin , Chudi Zhong , Diane Hu 2020

Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that h ave allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.

التعلم الآلي التعلم الالي

Deep Neural Network as an alternative to Boosted Decision Trees for PID

58 - Denis Stanev , Riccardo Riva , Michele Umassi 2021

In this paper we recreate, and improve, the binary classification method for particles proposed in Roe et al. (2005) paper Boosted decision trees as an alternative to artificial neural networks for particle identification. Such particles are tau neut rinos, which we will refer to as background, and electronic neutrinos: the signal we are interested in. In the original paper the preferred algorithm is a Boosted decision tree. This is due to its low effort tuning and good overall performance at the time. Our choice for implementation is a deep neural network, faster and more promising in performance. We will show how, using modern techniques, we are able to improve on the original result, both in accuracy and in training time.

تحليل البيانات والإحصاءات والاحتمال التعلم الآلي الفيزياء الحسابية