We present a novel method to jointly learn a 3D face parametric model and 3D face reconstruction from diverse sources. Previous methods usually learn 3D face modeling from one kind of source, such as scanned data or in-the-wild images. Although 3D scanned data contain accurate geometric information of face shapes, the capture system is expensive and such datasets usually contain a small number of subjects. On the other hand, in-the-wild face images are easily obtained and there are a large number of facial images. However, facial images do not contain explicit geometric information. In this paper, we propose a method to learn a unified face model from diverse sources. Besides scanned face data and face images, we also utilize a large number of RGB-D images captured with an iPhone X to bridge the gap between the two sources. Experimental results demonstrate that with training data from more sources, we can learn a more powerful face model.