The recovery of 3D shape and pose from 2D landmarks stemming from a large ensemble of images can be viewed as a non-rigid structure from motion (NRSfM) problem. Classical NRSfM approaches, however, are problematic as they rely on heuristic priors on the 3D structure (e.g. low rank) that do not scale well to large datasets. Learning-based methods are showing the potential to reconstruct a much broader set of 3D structures than classical methods -- dramatically expanding the importance of NRSfM to atemporal unsupervised 2D to 3D lifting. Hitherto, these learning approaches have not been able to effectively model perspective cameras or handle missing/occluded points -- limiting their applicability to in-the-wild datasets. In this paper, we present a generalized strategy for improving learning-based NRSfM methods to tackle the above issues. Our approach, Deep NRSfM++, achieves state-of-the-art performance across numerous large-scale benchmarks, outperforming both classical and learning-based 2D-3D lifting methods.