Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform

62 0 0.0 ( 0 )

Download Cite

Added by Lingni Ma

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Lingni Ma - Jorg Stuckler - Tao Wu

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs). Many state-of-the-art approaches either tackle the loss of high-resolution information due to pooling in the encoder stage, or use dilated convolutions or high-resolution lanes to maintain detailed feature maps and predictions. Motivated by the structural analogy between multi-resolution wavelet analysis and the pooling/unpooling layers of CNNs, we introduce discrete wavelet transform (DWT) into the CNN encoder-decoder architecture and propose WCNN. The high-frequency wavelet coefficients are computed at encoder, which are later used at the decoder to unpooled jointly with coarse-resolution feature maps through the inverse DWT. The DWT/iDWT is further used to develop two wavelet pyramids to capture the global context, where the multi-resolution DWT is applied to successively reduce the spatial resolution and increase the receptive field. Experiment with the Cityscape dataset, the proposed WCNNs are computationally efficient and yield improvements the accuracy for high-resolution dense pixelwise prediction.

rate research

Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks

316 - Jun-Wei Hsieh , Ming-Ching Chang , Bor-Shiun Wang 2021

Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to retain features better and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures and consistently outperform standard (max, mean, mixed, and stochastic) pooling operations.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing

Wavelet based edge feature enhancement for convolutional neural networks

55 - D. D. N. De Silva , S. Fernando , I. T. S. Piyatilake 2018

Convolutional neural networks are able to perform a hierarchical learning process starting with local features. However, a limited attention is paid to enhancing such elementary level features like edges. We propose and evaluate two wavelet-based edge feature enhancement methods to preprocess the input images to convolutional neural networks. The first method develops feature enhanced representations by decomposing the input images using wavelet transform and limited reconstructing subsequently. The second method develops such feature enhanced inputs to the network using local modulus maxima of wavelet coefficients. For each method, we have developed a new preprocessing layer by implementing each purposed method and have appended to the network architecture. Our empirical evaluations demonstrate that the proposed methods are outperforming the baselines and previously published work with significant accuracy gains.

Computer Vision and Pattern Recognition

Transform-Invariant Convolutional Neural Networks for Image Classification and Search

148 - Xu Shen , Xinmei Tian , Anfeng He 2019

Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transform-invariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval. The code is available at https://github.com/jasonustc/caffe-multigpu/tree/TICNN.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing

Interpretable Convolutional Neural Networks via Feedforward Design

194 - C.-C. Jay Kuo , Min Zhang , Siyang Li 2018

The model parameters of convolutional neural networks (CNNs) are determined by backpropagation (BP). In this work, we propose an interpretable feedforward (FF) design without any BP as a reference. The FF design adopts a data-centric approach. It derives network parameters of the current layer based on data statistics from the output of the previous layer in a one-pass manner. To construct convolutional layers, we develop a new signal transform, called the Saab (Subspace Approximation with Adjusted Bias) transform. It is a variant of the principal component analysis (PCA) with an added bias vector to annihilate activations nonlinearity. Multiple Saab transforms in cascade yield multiple convolutional layers. As to fully-connected (FC) layers, we construct them using a cascade of multi-stage linear least squared regressors (LSRs). The classification and robustness (against adversarial attacks) performances of BP- and FF-designed CNNs applied to the MNIST and the CIFAR-10 datasets are compared. Finally, we comment on the relationship between BP and FF designs.

Computer Vision and Pattern Recognition

A Hybrid Traffic Speed Forecasting Approach Integrating Wavelet Transform and Motif-based Graph Convolutional Recurrent Neural Network

69 - Na Zhang , Xuefeng Guan , Jun Cao 2019

Traffic forecasting is crucial for urban traffic management and guidance. However, existing methods rarely exploit the time-frequency properties of traffic speed observations, and often neglect the propagation of traffic flows from upstream to downstream road segments. In this paper, we propose a hybrid approach that learns the spatio-temporal dependency in traffic flows and predicts short-term traffic speeds on a road network. Specifically, we employ wavelet transform to decompose raw traffic data into several components with different frequency sub-bands. A Motif-based Graph Convolutional Recurrent Neural Network (Motif-GCRNN) and Auto-Regressive Moving Average (ARMA) are used to train and predict low-frequency components and high-frequency components, respectively. In the Motif-GCRNN framework, we integrate Graph Convolutional Networks (GCNs) with local sub-graph structures - Motifs - to capture the spatial correlations among road segments, and apply Long Short-Term Memory (LSTM) to extract the short-term and periodic patterns in traffic speeds. Experiments on a traffic dataset collected in Chengdu, China, demonstrate that the proposed hybrid method outperforms six state-of-art prediction methods.

Computer Vision and Pattern Recognition Machine Learning Signal Processing