Forecasting future traffic flows from previous ones is a challenging problem because of their complex and dynamic nature of spatio-temporal structures. Most existing graph-based CNNs attempt to capture the static relations while largely neglecting the dynamics underlying sequential data. In this paper, we present dynamic spatio-temporal graph-based CNNs (DST-GCNNs) by learning expressive features to represent spatio-temporal structures and predict future traffic flows from surveillance video data. In particular, DST-GCNN is a two stream network. In the flow prediction stream, we present a novel graph-based spatio-temporal convolutional layer to extract features from a graph representation of traffic flows. Then several such layers are stacked together to predict future flows over time. Meanwhile, the relations between traffic flows in the graph are often time variant as the traffic condition changes over time. To capture the graph dynamics, we use the graph prediction stream to predict the dynamic graph structures, and the predicted structures are fed into the flow prediction stream. Experiments on real datasets demonstrate that the proposed model achieves competitive performances compared with the other state-of-the-art methods.