Decision tree heuristics can fail, even in the smoothed setting

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mingda Qiao

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Guy Blanc - Jane Lange - Mingda Qiao

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{Omega(k)}$ before achieving high accuracy.

قيم البحث

439 - Nika Haghtalab , Tim Roughgarden , Abhishek Shetty 2021

We prove novel algorithmic guarantees for several online problems in the smoothed analysis model. In this model, at each time an adversary chooses an input distribution with density function bounded above by $tfrac{1}{sigma}$ times that of the unifor m distribution; nature then samples an input from this distribution. Crucially, our results hold for {em adaptive} adversaries that can choose an input distribution based on the decisions of the algorithm and the realizations of the inputs in the previous time steps. This paper presents a general technique for proving smoothed algorithmic guarantees against adaptive adversaries, in effect reducing the setting of adaptive adversaries to the simpler case of oblivious adversaries. We apply this technique to prove strong smoothed guarantees for three problems: -Online learning: We consider the online prediction problem, where instances are generated from an adaptive sequence of $sigma$-smooth distributions and the hypothesis class has VC dimension $d$. We bound the regret by $tilde{O}big(sqrt{T dln(1/sigma)} + dsqrt{ln(T/sigma)}big)$. This answers open questions of [RST11,Hag18]. -Online discrepancy minimization: We consider the online Komlos problem, where the input is generated from an adaptive sequence of $sigma$-smooth and isotropic distributions on the $ell_2$ unit ball. We bound the $ell_infty$ norm of the discrepancy vector by $tilde{O}big(ln^2!big( frac{nT}{sigma}big) big)$. -Dispersion in online optimization: We consider online optimization of piecewise Lipschitz functions where functions with $ell$ discontinuities are chosen by a smoothed adaptive adversary and show that the resulting sequence is $big( {sigma}/{sqrt{Tell}}, tilde Obig(sqrt{Tell} big)big)$-dispersed. This matches the parameters of [BDV18] for oblivious adversaries, up to log factors.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Smoothed Analysis of Online and Differentially Private Learning

196 - Nika Haghtalab , Tim Roughgarden , Abhishek Shetty 2020

Practical and pervasive needs for robustness and privacy in algorithms have inspired the design of online adversarial and differentially private learning algorithms. The primary quantity that characterizes learnability in these settings is the Little stone dimension of the class of hypotheses [Ben-David et al., 2009, Alon et al., 2019]. This characterization is often interpreted as an impossibility result because classes such as linear thresholds and neural networks have infinite Littlestone dimension. In this paper, we apply the framework of smoothed analysis [Spielman and Teng, 2004], in which adversarially chosen inputs are perturbed slightly by nature. We show that fundamentally stronger regret and error guarantees are possible with smoothed adversaries than with worst-case adversaries. In particular, we obtain regret and privacy error bounds that depend only on the VC dimension and the bracketing number of a hypothesis class, and on the magnitudes of the perturbations.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Learning stochastic decision trees

67 - Guy Blanc , Jane Lange , Li-Yang Tan 2021

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $eta$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm run s in time $n^{O(log(s/varepsilon)/varepsilon^2)}$ and returns a hypothesis with error within an additive $2eta + varepsilon$ of the Bayes optimal. An additive $2eta$ is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of $O(eta) + varepsilon$ was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

On the tree-width of even-hole-free graphs

111 - Pierre Aboulker , Isolde Adler , Eun Jung Kim 2020

The class of all even-hole-free graphs has unbounded tree-width, as it contains all complete graphs. Recently, a class of (even-hole, $K_4$)-free graphs was constructed, that still has unbounded tree-width [Sintiari and Trotignon, 2019]. The class ha s unbounded degree and contains arbitrarily large clique-minors. We ask whether this is necessary. We prove that for every graph $G$, if $G$ excludes a fixed graph $H$ as a minor, then $G$ either has small tree-width, or $G$ contains a large wall or the line graph of a large wall as induced subgraph. This can be seen as a strengthening of Robertson and Seymours excluded grid theorem for the case of minor-free graphs. Our theorem implies that every class of even-hole-free graphs excluding a fixed graph as a minor has bounded tree-width. In fact, our theorem applies to a more general class: (theta, prism)-free graphs. This implies the known result that planar even hole-free graph have bounded tree-width [da Silva and Linhares Sales, Discrete Applied Mathematics 2010]. We conjecture that even-hole-free graphs of bounded degree have bounded tree-width. If true, this would mean that even-hole-freeness is testable in the bounded-degree graph model of property testing. We prove the conjecture for subcubic graphs and we give a bound on the tree-width of the class of (even hole, pyramid)-free graphs of degree at most 4.

الرياضيات المتقطعة بنى وهياكل البيانات والخوارزميات التوافقية

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

92 - Kaiwen Wang , Travis Dick , Maria-Florina Balcan 2020

This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algo rithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace mechanism. Our second method LocalRNM significantly reduces communication and added noise by performing local optimization at each data holder. We provide the first utility guarantees for differentially private top-down decision tree learning in both the single machine and distributed settings. These guarantees show that the error of the privately-learned decision tree quickly goes to zero provided that the dataset is sufficiently large. Our extensive experiments on real datasets illustrate the trade-offs of privacy, accuracy and generalization when learning private decision trees in the distributed setting.

التعلم الآلي التشفير والأمن التعلم الالي