Facing the complex marine environment, it is extremely challenging to conduct underwater acoustic target recognition (UATR) using ship-radiated noise. Inspired by neural mechanism of auditory perception, this paper provides a new deep neural network trained by original underwater acoustic signals with depthwise separable convolution (DWS) and time-dilated convolution neural network, named auditory perception inspired time-dilated convolution neural network (ATCNN), and then implements detection and classification for underwater acoustic signals. The proposed ATCNN model consists of learnable features extractor and integration layer inspired by auditory perception, and time-dilated convolution inspired by language model. This paper decomposes original time-domain ship-radiated noise signals into different frequency components with depthwise separable convolution filter, and then extracts signal features based on auditory perception. The deep features are integrated on integration layer. The time-dilated convolution is used for long-term contextual modeling. As a result, like language model, intra-class and inter-class information can be fully used for UATR. For UATR task, the classification accuracy reaches 90.9%, which is the highest in contrast experiment. Experimental results show that ATCNN has great potential to improve the performance of UATR classification.