No Arabic abstract
For dendrite graphs from biological experiments on mouses retinal ganglion cells, a paper by Nakatsukasa, Saito and Woei reveals a mysterious phase transition phenomenon in the spectra of the corresponding graph Laplacian matrices. While the bulk of the spectrum can be well understood by structures resembling starlike trees, mysteries about the spikes, that is, isolated eigenvalues outside the bulk spectrum, remain unexplained. In this paper, we bring new insights on these mysteries by considering a class of uniform trees. Exact relationships between the number of such spikes and the number of T-junctions are analyzed in function of the number of vertices separating the T-junctions. Using these theoretic results, predictions are proposed for the number of spikes observed in real-life dendrite graphs. Interestingly enough, these predictions match well the observed numbers of spikes, thus confirm the practical meaningness of our theoretical results.
In this paper, we address the question of comparison between populations of trees. We study an statistical test based on the distance between empirical mean trees, as an analog of the two sample z statistic for comparing two means. Despite its simplicity, we can report that the test is quite powerful to separate distributions with different means but it does not distinguish between different populations with the same mean, a more complicated test should be applied in that setting. The performance of the test is studied via simulations on Galton-Watson branching processes. We also show an application to a real data problem in genomics.
Many machine learning tools for regression are based on recursive partitioning of the covariate space into smaller regions, where the regression function can be estimated locally. Among these, regression trees and their ensembles have demonstrated impressive empirical performance. In this work, we shed light on the machinery behind Bayesian variants of these methods. In particular, we study Bayesian regression histograms, such as Bayesian dyadic trees, in the simple regression case with just one predictor. We focus on the reconstruction of regression surfaces that are piecewise constant, where the number of jumps is unknown. We show that with suitably designed priors, posterior distributions concentrate around the true step regression function at a near-minimax rate. These results do not require the knowledge of the true number of steps, nor the width of the true partitioning cells. Thus, Bayesian dyadic regression trees are fully adaptive and can recover the true piecewise regression function nearly as well as if we knew the exact number and location of jumps. Our results constitute the first step towards understanding why Bayesian trees and their ensembles have worked so well in practice. As an aside, we discuss prior distributions on balanced interval partitions and how they relate to an old problem in geometric probability. Namely, we relate the probability of covering the circumference of a circle with random arcs whose endpoints are confined to a grid, a new variant of the original problem.
Since their inception in the 1980s, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to over-fitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has been missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p>n. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.
In this paper, we use a new and correct method to determine the $n$-vertex $k$-trees with the first three largest signless Laplacian indices.
Brouwers Conjecture states that, for any graph $G$, the sum of the $k$ largest (combinatorial) Laplacian eigenvalues of $G$ is at most $|E(G)| + binom{k+1}{2}$, $1 leq k leq n$. We present several interrelated results establishing Brouwers conjecture $text{BC}_k(G)$ for a wide range of graphs $G$ and parameters $k$. In particular, we show that (1) $text{BC}_k(G)$ is true for low-arboricity graphs, and in particular for planar $G$ when $k geq 11$; (2) $text{BC}_k(G)$ is true whenever the variance of the degree sequence is not very high, generalizing previous results for $G$ regular or random; (3) $text{BC}_k(G)$ is true if $G$ belongs to a hereditarily spectrally-bounded class and $k$ is sufficiently large as a function of $k$, in particular $k geq sqrt{32n}$ for bipartite graphs; (4) $text{BC}_k(G)$ holds unless $G$ has edge-edit distance $< k sqrt{2n} = O(n^{3/2})$ from a split graph; (5) no $G$ violates the conjectured upper bound by more than $O(n^{5/4})$, and bipartite $G$ by no more than $O(n)$; and (6) $text{BC}_k(G)$ holds for all $k$ outside an interval of length $O(n^{3/4})$. Furthermore, we present a surprising negative result: asymptotically almost surely, a uniform random signed complete graph violates the conjectured bound by $Omega(n)$.