Imaging sonars have shown better flexibility than optical cameras in underwater localization and navigation for autonomous underwater vehicles (AUVs). However, the sparsity of underwater acoustic features and the loss of elevation angle in sonar frames have imposed degeneracy cases, namely under-constrained or unobservable cases according to optimization-based or EKF-based simultaneous localization and mapping (SLAM). In these cases, the relative ambiguous sensor poses and landmarks cannot be triangulated. To handle this, this paper proposes a robust imaging sonar SLAM approach based on sonar keyframes (KFs) and an elastic sliding window. The degeneracy cases are further analyzed and the triangulation property of 2D landmarks in arbitrary motion has been proved. These degeneracy cases are discriminated and the sonar KFs are selected via saliency criteria to extract and save the informative constraints from previous sonar measurements. Incorporating the inertial measurements, an elastic sliding windowed back-end optimization is proposed to mostly utilize the past salient sonar frames and also restrain the optimization scale. Comparative experiments validate the effectiveness of the proposed method and its robustness to outliers from the wrong data association, even without loop closure.