In recent years, the convolutional neural networks (CNNs) have received a lot of interest in the side-channel community. The previous work has shown that CNNs have the potential of breaking the cryptographic algorithm protected with masking or desynchronization. Before, several CNN models have been exploited, reaching the same or even better level of performance compared to the traditional side-channel attack (SCA). In this paper, we investigate the architecture of Residual Network and build a new CNN model called attention network. To enhance the power of the attention network, we introduce an attention mechanism - Convolutional Block Attention Module (CBAM) and incorporate CBAM into the CNN architecture. CBAM points out the informative points of the input traces and makes the attention network focus on the relevant leakages of the measurements. It is able to improve the performance of the CNNs. Because the irrelevant points will introduce the extra noises and cause a worse performance of attacks. We compare our attention network with the one designed for the masking AES implementation called ASCAD network in this paper. We show that the attention network has a better performance than the ASCAD network. Finally, a new visualization method, named Class Gradient Visualization (CGV) is proposed to recognize which points of the input traces have a positive influence on the predicted result of the neural networks. In another aspect, it can explain why the attention network is superior to the ASCAD network. We validate the attention network through extensive experiments on four public datasets and demonstrate that the attention network is efficient in different AES implementations.