Reachable set computation is an important technique for the verification of safety properties of dynamical systems. In this paper, we investigate reachable set computation for discrete nonlinear systems based on parallelotope bundles. The algorithm relies on computing an upper bound on the supremum of a nonlinear function over a rectangular domain, which has been traditionally done using Bernstein polynomials. We strive to remove the manual step of parallelotope template selection to make the method fully automatic. Furthermore, we show that changing templates dynamically during computations cans improve accuracy. To this end, we investigate two techniques for generating the template directions. The first technique approximates the dynamics as a linear transformation and generates templates using this linear transformation. The second technique uses Principal Component Analysis (PCA) of sample trajectories for generating templates. We have implemented our approach in a Python-based tool called Kaa and improve its performance by two main enhancements. The tool is modular and use two types of global optimization solvers, the first using Bernstein polynomials and the second using NASAs Kodiak nonlinear optimization library. Second, we leverage the natural parallelism of the reachability algorithm and parallelize the Kaa implementation. We demonstrate the improved accuracy of our approach on several standard nonlinear benchmark systems.