Inertial measurement units are widely used in different fields to estimate the attitude. Many algorithms have been proposed to improve estimation performance. However, most of them still suffer from 1) inaccurate initial estimation, 2) inaccurate initial filter gain, and 3) non-Gaussian process and/or measurement noise. In this paper, we leverage reinforcement learning to compensate for the classical extended Kalman filter estimation, i.e., to learn the filter gain from the sensor measurements. We also analyse the convergence of the estimate error. The effectiveness of the proposed algorithm is validated on both simulated data and real data.