We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $eta$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm runs in time $n^{O(log(s/varepsilon)/varepsilon^2)}$ and returns a hypothesis with error within an additive $2eta + varepsilon$ of the Bayes optimal. An additive $2eta$ is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of $O(eta) + varepsilon$ was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.