Short-term traffic forecasting based on deep learning methods, especially long short-term memory (LSTM) neural networks, has received much attention in recent years. However, the potential of deep learning methods in traffic forecasting has not yet fully been exploited in terms of the depth of the model architecture, the spatial scale of the prediction area, and the predictive power of spatial-temporal data. In this paper, a deep stacked bidirectional and unidirectional LSTM (SBU- LSTM) neural network architecture is proposed, which considers both forward and backward dependencies in time series data, to predict network-wide traffic speed. A bidirectional LSTM (BDLSM) layer is exploited to capture spatial features and bidirectional temporal dependencies from historical data. To the best of our knowledge, this is the first time that BDLSTMs have been applied as building blocks for a deep architecture model to measure the backward dependency of traffic data for prediction. The proposed model can handle missing values in input data by using a masking mechanism. Further, this scalable model can predict traffic speed for both freeway and complex urban traffic networks. Comparisons with other classical and state-of-the-art models indicate that the proposed SBU-LSTM neural network achieves superior prediction performance for the whole traffic network in both accuracy and robustness.