Properly learning decision trees in almost polynomial time

339 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jane Lange

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Guy Blanc - Jane Lange - Mingda Qiao

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We give an $n^{O(loglog n)}$-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over ${pm 1}^n$. Even in the realizable setting, the previous fastest runtime was $n^{O(log n)}$, a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of ODonnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be pruned so that every variable in the resulting tree is influential.

قيم البحث

137 - Guy Blanc , Jane Lange , Li-Yang Tan 2020

We study sublinear and local computation algorithms for decision trees, focusing on testing and reconstruction. Our first result is a tester that runs in $mathrm{poly}(log s, 1/varepsilon)cdot nlog n$ time, makes $mathrm{poly}(log s,1/varepsilon)cdot log n$ queries to an unknown function $f$, and: $circ$ Accepts if $f$ is $varepsilon$-close to a size-$s$ decision tree; $circ$ Rejects if $f$ is $Omega(varepsilon)$-far from decision trees of size $s^{tilde{O}((log s)^2/varepsilon^2)}$. Existing testers distinguish size-$s$ decision trees from those that are $varepsilon$-far from from size-$s$ decision trees in $mathrm{poly}(s^s,1/varepsilon)cdot n$ time with $tilde{O}(s/varepsilon)$ queries. We therefore solve an incomparable problem, but achieve doubly-exponential-in-$s$ and exponential-in-$s$ improvements in time and query complexities respectively. We obtain our tester by designing a reconstruction algorithm for decision trees: given query access to a function $f$ that is close to a small decision tree, this algorithm provides fast query access to a small decision tree that is close to $f$. By known relationships, our results yield reconstruction algorithms for numerous other boolean function properties -- Fourier degree, randomized and quantum query complexities, certificate complexity, sensitivity, etc. -- which in turn yield new testers for these properties. Finally, we give a hardness result for testing whether an unknown function is $varepsilon$-close-to or $Omega(varepsilon)$-far-from size-$s$ decision trees. We show that an efficient algorithm for this task would yield an efficient algorithm for properly learning decision trees, a central open problem of learning theory. It has long been known that proper learning algorithms for any class $mathcal{H}$ yield property testers for $mathcal{H}$; this provides an example of a converse.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

Solving Vertex Cover in Polynomial Time on Hyperbolic Random Graphs

69 - Thomas Blasius , Philipp Fischbeck , Tobias Friedrich 2019

The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real -world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability. The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that, on real instances, this leads to better approximations than the standard greedy approach within reasonable time.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Reliably Learning the ReLU in Polynomial Time

68 - Surbhi Goel , Varun Kanade , Adam Klivans 2016

We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $mathbf{x} mapsto max(0, mathbf{w} cdot mathbf{x})$ with $mathbf{w} in mathbb{S}^{n-1}$. Our algorithm works in the challeng ing Reliable Agnostic learning model of Kalai, Kanade, and Mansour (2009) where the learner is given access to a distribution $cal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $cal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to any distribution on $mathbb{S}^{n-1}$ (the unit sphere in $n$ dimensions) and for any error parameter $epsilon = Omega(1/log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $epsilon$ must be $Omega(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs. Our techniques combine kernel methods and polynomial approximations with a dual-loss approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for convex piecewise-linear fitting and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.

التعلم الآلي التعقيد الحسابي التعلم الالي

Business Process Full Compliance with Respect to a Set of Conditional Obligation in Polynomial Time

60 - Silvano Colombo Tosatto , Guido Governatori , Nick Van Beest 2020

In this paper, we present a new methodology to evaluate whether a business process model is fully compliant with a regulatory framework composed of a set of conditional obligations. The methodology is based failure delta-constraints that are evaluate d on bottom-up aggregations of a tree-like representation of business process models. While the generic problem of proving full compliance is in coNP-complete, we show that verifying full compliance can be done in polynomial time using our methodology, for an acyclic structured process model given a regulatory framework composed by a set of conditional obligations, whose elements are restricted to be represented by propositional literals

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Learning stochastic decision trees

67 - Guy Blanc , Jane Lange , Li-Yang Tan 2021

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $eta$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm run s in time $n^{O(log(s/varepsilon)/varepsilon^2)}$ and returns a hypothesis with error within an additive $2eta + varepsilon$ of the Bayes optimal. An additive $2eta$ is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of $O(eta) + varepsilon$ was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي