We propose a novel method for massive Multiple-Input Multiple-Output (massive MIMO) in Frequency Division Duplexing (FDD) systems. Due to the large frequency separation between Uplink (UL) and Downlink (DL), in FDD systems channel reciprocity does not hold. Hence, in order to provide DL channel state information to the Base Station (BS), closed-loop DL channel probing and Channel State Information (CSI) feedback is needed. In massive MIMO this incurs typically a large training overhead. For example, in a typical configuration with M = 200 BS antennas and fading coherence block of T = 200 symbols, the resulting rate penalty factor due to the DL training overhead, given by max{0, 1 - M/T}, is close to 0. To reduce this overhead, we build upon the well-known fact that the Angular Scattering Function (ASF) of the user channels is invariant over frequency intervals whose size is small with respect to the carrier frequency (as in current FDD cellular standards). This allows to estimate the users DL channel covariance matrix from UL pilots without additional overhead. Based on this covariance information, we propose a novel sparsifying precoder in order to maximize the rank of the effective sparsified channel matrix subject to the condition that each effective user channel has sparsity not larger than some desired DL pilot dimension T_{dl}, resulting in the DL training overhead factor max{0, 1 - T_{dl} / T} and CSI feedback cost of T_{dl} pilot measurements. The optimization of the sparsifying precoder is formulated as a Mixed Integer Linear Program, that can be efficiently solved. Extensive simulation results demonstrate the superiority of the proposed approach with respect to concurrent state-of-the-art schemes based on compressed sensing or UL/DL dictionary learning.