In this paper, we consider massive multiple-input-multiple-output (MIMO) communication systems with a uniform planar array (UPA) at the base station (BS) and investigate the downlink precoding with imperfect channel state information (CSI). By exploiting both instantaneous and statistical CSI, we aim to design precoding vectors to maximize the ergodic rate (e.g., sum rate, minimum rate and etc.) subject to a total transmit power constraint. To maximize an upper bound of the ergodic rate, we leverage the corresponding Lagrangian formulation and identify the structural characteristics of the optimal precoder as the solution to a generalized eigenvalue problem. As such, the high-dimensional precoder design problem turns into a low-dimensional power control problem. The Lagrange multipliers play a crucial role in determining both precoder directions and power parameters, yet are challenging to be solved directly. To figure out the Lagrange multipliers, we develop a general framework underpinned by a properly designed neural network that learns directly from CSI. To further relieve the computational burden, we obtain a low-complexity framework by decomposing the original problem into computationally efficient subproblems with instantaneous and statistical CSI handled separately. With the off-line pretrained neural network, the online computational complexity of precoding is substantially reduced compared with the existing iterative algorithm while maintaining nearly the same performance.