In the wind energy industry, it is of great importance to develop models that accurately forecast the power output of a wind turbine, as such predictions are used for wind farm location assessment or power pricing and bidding, monitoring, and preventive maintenance. As a first step, and following the guidelines of the existing literature, we use the supervisory control and data acquisition (SCADA) data to model the wind turbine power curve (WTPC). We explore various parametric and non-parametric approaches for the modeling of the WTPC, such as parametric logistic functions, and non-parametric piecewise linear, polynomial, or cubic spline interpolation functions. We demonstrate that all aforementioned classes of models are rich enough (with respect to their relative complexity) to accurately model the WTPC, as their mean squared error (MSE) is close to the MSE lower bound calculated from the historical data. We further enhance the accuracy of our proposed model, by incorporating additional environmental factors that affect the power output, such as the ambient temperature, and the wind direction. However, all aforementioned models, when it comes to forecasting, seem to have an intrinsic limitation, due to their inability to capture the inherent auto-correlation of the data. To avoid this conundrum, we show that adding a properly scaled ARMA modeling layer increases short-term prediction performance, while keeping the long-term prediction capability of the model.