Similar to light, gravitational waves (GWs) can be lensed. Such lensing phenomena can magnify the waves, create multiple images observable as repeated events, and superpose several waveforms together, inducing potentially discernible patterns on the waves. In particular, when the lens is small, $lesssim 10^5 M_odot$, it can produce lensed images with time delays shorter than the typical gravitational-wave signal length that conspire together to form ``beating patterns. We present a proof-of-principle study utilizing deep learning for identification of such a lensing signature. We bring the excellence of state-of-the-art deep learning models at recognizing foreground objects from background noises to identifying lensed GWs from noise present spectrograms. We assume the lens mass is around $10^3 M_odot$ -- $10^5 M_odot$, which can produce the order of millisecond time delays between two images of lensed GWs. We discuss the feasibility of distinguishing lensed GWs from unlensed ones and estimating physical and lensing parameters. Suggested method may be of interest to the study of more complicated lensing configurations for which we do not have accurate waveform templates.