Crossfire attack is a recently proposed threat designed to disconnect whole geographical areas, such as cities or states, from the Internet. Orchestrated in multiple phases, the attack uses a massively distributed botnet to generate low-rate benign traffic aiming to congest selected network links, so-called target links. The adoption of benign traffic, while simultaneously targeting multiple network links, makes the detection of the Crossfire attack a serious challenge. In this paper, we propose a framework for early detection of Crossfire attack, i.e., detection in the warm-up period of the attack. We propose to monitor traffic at the potential decoy servers and discuss the advantages comparing with other monitoring approaches. Since the low-rate attack traffic is very difficult to distinguish from the background traffic, we investigate several deep learning methods to mine the spatiotemporal features for attack detection. We investigate Autoencoder, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) Network to detect the Crossfire attack during its warm-up period. We report encouraging experiment results.