ﻻ يوجد ملخص باللغة العربية
Facing the complex marine environment, it is extremely challenging to conduct underwater acoustic target recognition (UATR) using ship-radiated noise. Inspired by neural mechanism of auditory perception, this paper provides a new deep neural network trained by original underwater acoustic signals with depthwise separable convolution (DWS) and time-dilated convolution neural network, named auditory perception inspired time-dilated convolution neural network (ATCNN), and then implements detection and classification for underwater acoustic signals. The proposed ATCNN model consists of learnable features extractor and integration layer inspired by auditory perception, and time-dilated convolution inspired by language model. This paper decomposes original time-domain ship-radiated noise signals into different frequency components with depthwise separable convolution filter, and then extracts signal features based on auditory perception. The deep features are integrated on integration layer. The time-dilated convolution is used for long-term contextual modeling. As a result, like language model, intra-class and inter-class information can be fully used for UATR. For UATR task, the classification accuracy reaches 90.9%, which is the highest in contrast experiment. Experimental results show that ATCNN has great potential to improve the performance of UATR classification.
We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly compos
This paper describes an acoustic scene classification method which achieved the 4th ranking result in the IEEE AASP challenge of Detection and Classification of Acoustic Scenes and Events 2016. In order to accomplish the ensuing task, several methods
Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers. This work presents a joint framework that combines time-domain target-speaker speech extraction and Recu
Acoustic scene recordings are represented by different types of handcrafted or Neural Network-derived features. These features, typically of thousands of dimensions, are classified in state of the art approaches using kernel machines, such as the Sup
In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout b