The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than the random-walk proposals. However, probabilistic inference involving multi-modal distributions is very difficult for standard HMC method, especially when the modes are far away from each other. Sampling algorithms are then often incapable of traveling across the places of low probability. In this paper, we propose a novel MCMC algorithm which aims to sample from multi-modal distributions effectively. The method improves Hamiltonian dynamics to reduce the autocorrelation of the samples and uses a variational distribution to explore the phase space and find new modes. A formal proof is provided which shows that the proposed method can converge to target distributions. Both synthetic and real datasets are used to evaluate its properties and performance. The experimental results verify the theory and show superior performance in multi-modal sampling.