Predicting the popularity of online content is a fundamental problem in various application areas. One practical challenge for popularity prediction takes roots in the different settings of popularity prediction tasks in different situations, e.g., the varying lengths of the observation time window or prediction horizon. In other words, a good model for popularity prediction is desired to handle various tasks with different settings. However, the conventional paradigm for popularity prediction is training a separate prediction model for each prediction task, and thus the obtained model for one task is difficult to be generalized to other tasks, causing a great waste of training time and computational resources. To solve this issue, in this paper, we propose a novel pre-training framework for popularity prediction, aiming to pre-train a general deep representation model by learning intrinsic knowledge about popularity dynamics from the readily available diffusion cascades. We design a novel pretext task for pre-training, i.e., temporal context prediction for two randomly sampled time slices of popularity dynamics, impelling the deep prediction model to effectively capture the characteristics of popularity dynamics. Taking the state-of-the-art deep model, i.e., temporal convolutional neural network, as an instantiation of our proposed framework, experimental results conducted on both Sina Weibo and Twitter datasets demonstrate both the effectiveness and efficiency of the proposed pre-training framework for multiple popularity prediction tasks.