As a crucial component in intelligent transportation systems, traffic flow prediction has recently attracted widespread research interest in the field of artificial intelligence (AI) with the increasing availability of massive traffic mobility data. Its key challenge lies in how to integrate diverse factors (such as temporal rules and spatial dependencies) to infer the evolution trend of traffic flow. To address this problem, we propose a unified neural network called Attentive Traffic Flow Machine (ATFM), which can effectively learn the spatial-temporal feature representations of traffic flow with an attention mechanism. In particular, our ATFM is composed of two progressive Convolutional Long Short-Term Memory (ConvLSTM cite{xingjian2015convolutional}) units connected with a convolutional layer. Specifically, the first ConvLSTM unit takes normal traffic flow features as input and generates a hidden state at each time-step, which is further fed into the connected convolutional layer for spatial attention map inference. The second ConvLSTM unit aims at learning the dynamic spatial-temporal representations from the attentionally weighted traffic flow features. Further, we develop two deep learning frameworks based on ATFM to predict citywide short-term/long-term traffic flow by adaptively incorporating the sequential and periodic data as well as other external influences. Extensive experiments on two standard benchmarks well demonstrate the superiority of the proposed method for traffic flow prediction. Moreover, to verify the generalization of our method, we also apply the customized framework to forecast the passenger pickup/dropoff demands in traffic prediction and show its superior performance. Our code and data are available at {color{blue}url{https://github.com/liulingbo918/ATFM}}.