Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.