Despite its best performance in image denoising, the supervised deep denoising methods require paired noise-clean data, which are often unavailable. To address this challenge, Noise2Noise was designed based on the fact that paired noise-clean images can be replaced by paired noise-noise images that are easier to collect. However, in many scenarios the collection of paired noise-noise images is still impractical. To bypass labeled images, Noise2Void methods predict masked pixels from their surroundings with single noisy images only and give improved denoising results that still need improvements. An observation on classic denoising methods is that non-local mean (NLM) outcomes are typically superior to locally denoised results. In contrast, Noise2Void and its variants do not utilize self-similarities in an image as the NLM-based methods do. Here we propose Noise2Sim, an NLM-inspired self-learning method for image denoising. Specifically, Noise2Sim leverages the self-similarity of image pixels to train the denoising network, requiring single noisy images only. Our theoretical analysis shows that Noise2Sim tends to be equivalent to Noise2Noise under mild conditions. To efficiently manage the computational burden for globally searching similar pixels, we design a two-step procedure to provide data for Noise2Sim training. Extensive experiments demonstrate the superiority of Noise2Sim on common benchmark datasets.