Testing and reconstruction via decision trees

138 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Li-Yang Tan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Guy Blanc - Jane Lange - Li-Yang Tan

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study sublinear and local computation algorithms for decision trees, focusing on testing and reconstruction. Our first result is a tester that runs in $mathrm{poly}(log s, 1/varepsilon)cdot nlog n$ time, makes $mathrm{poly}(log s,1/varepsilon)cdot log n$ queries to an unknown function $f$, and: $circ$ Accepts if $f$ is $varepsilon$-close to a size-$s$ decision tree; $circ$ Rejects if $f$ is $Omega(varepsilon)$-far from decision trees of size $s^{tilde{O}((log s)^2/varepsilon^2)}$. Existing testers distinguish size-$s$ decision trees from those that are $varepsilon$-far from from size-$s$ decision trees in $mathrm{poly}(s^s,1/varepsilon)cdot n$ time with $tilde{O}(s/varepsilon)$ queries. We therefore solve an incomparable problem, but achieve doubly-exponential-in-$s$ and exponential-in-$s$ improvements in time and query complexities respectively. We obtain our tester by designing a reconstruction algorithm for decision trees: given query access to a function $f$ that is close to a small decision tree, this algorithm provides fast query access to a small decision tree that is close to $f$. By known relationships, our results yield reconstruction algorithms for numerous other boolean function properties -- Fourier degree, randomized and quantum query complexities, certificate complexity, sensitivity, etc. -- which in turn yield new testers for these properties. Finally, we give a hardness result for testing whether an unknown function is $varepsilon$-close-to or $Omega(varepsilon)$-far-from size-$s$ decision trees. We show that an efficient algorithm for this task would yield an efficient algorithm for properly learning decision trees, a central open problem of learning theory. It has long been known that proper learning algorithms for any class $mathcal{H}$ yield property testers for $mathcal{H}$; this provides an example of a converse.

قيم البحث

338 - Guy Blanc , Jane Lange , Mingda Qiao 2021

We give an $n^{O(loglog n)}$-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over ${pm 1}^n$. Even in the realizable setting, the previous fastest runtime was $n^{O(log n)}$, a cons equence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of ODonnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be pruned so that every variable in the resulting tree is influential.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma

83 - Sepehr Assadi , Vishvajeet N 2021

We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a $(1+epsilon)$-approximation requires either $n^{Omega(1)}$ space or $Omega(1/epsilon)$ passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller $o(log{(1/epsilon)})$ passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a $p$-pass $s$-space streaming algorithm can only solve a decision problem with advantage $delta > 0$ over random guessing, then it cannot solve XOR of $ell$ independent copies of the problem with advantage much better than $delta^{ell}$. This result can be of independent interest and useful for other streaming lower bounds as well.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Best Match Graphs with Binary Trees

158 - David Schaller , Manuela Gei{ss} , Marc Hellmuth 2020

Best match graphs (BMG) are a key intermediate in graph-based orthology detection and contain a large amount of information on the gene tree. We provide a near-cubic algorithm to determine whether a BMG is binary-explainable, i.e., whether it can be explained by a fully resolved gene tree and, if so, to construct such a tree. Moreover, we show that all such binary trees are refinements of the unique binary-resolvable tree (BRT), which in general is a substantial refinement of the also unique least resolved tree of a BMG. Finally, we show that the problem of editing an arbitrary vertex-colored graph to a binary-explainable BMG is NP-complete and provide an integer linear program formulation for this task.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي الرياضيات المتقطعة

Hardness of Token Swapping on Trees

72 - Oswin Aichholzer , Erik D. Demaine , Matias Korman 2021

Given a graph where every vertex has exactly one labeled token, how can we most quickly execute a given permutation on the tokens? In (sequential) token swapping, the goal is to use the shortest possible sequence of swaps, each of which exchanges the tokens at the two endpoints of an edge of the graph. In parallel token swapping, the goal is to use the fewest rounds, each of which consists of one or more swaps on the edges of a matching. We prove that both of these problems remain NP-hard when the graph is restricted to be a tree. These token swapping problems have been studied by disparate groups of researchers in discrete mathematics, theoretical computer science, robot motion planning, game theory, and engineering. Previous work establishes NP-completeness on general graphs (for both problems); polynomial-time algorithms for simple graph classes such as cliques, stars, paths, and cycles; and constant-factor approximation algorithms in some cases. The two natural cases of sequential and parallel token swapping in trees were first studied over thirty years ago (as sorting with a transposition tree) and over twenty-five years ago (as routing permutations via matchings), yet their complexities were previously unknown. We also show limitations on approximation of sequential token swapping on trees: we identify a broad class of algorithms that encompass all three known polynomial-time algorithms that achieve the best known approximation factor (which is $2$) and show that no such algorithm can achieve an approximation factor less than $2$.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

On the Complexity of Role Colouring Planar Graphs, Trees and Cographs

163 - Christopher Purcell , M. Puck Rombach 2014

We prove several results about the complexity of the role colouring problem. A role colouring of a graph $G$ is an assignment of colours to the vertices of $G$ such that two vertices of the same colour have identical sets of colours in their neighbou rhoods. We show that the problem of finding a role colouring with $1< k <n$ colours is NP-hard for planar graphs. We show that restricting the problem to trees yields a polynomially solvable case, as long as $k$ is either constant or has a constant difference with $n$, the number of vertices in the tree. Finally, we prove that cographs are always $k$-role-colourable for $1<kleq n$ and construct such a colouring in polynomial time.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي الرياضيات المتقطعة