Is Each Layer Non-trivial in CNN?

57 0 0.0 ( 0 )

Download Cite

Added by Wei Wang

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Wei Wang - Yanjie Zhu - Zhuoxu Cui

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Convolutional neural network (CNN) models have achieved great success in many fields. With the advent of ResNet, networks used in practice are getting deeper and wider. However, is each layer non-trivial in networks? To answer this question, we trained a network on the training set, then we replace the network convolution kernels with zeros and test the result models on the test set. We compared experimental results with baseline and showed that we can reach similar or even the same performances. Although convolution kernels are the cores of networks, we demonstrate that some of them are trivial and regular in ResNet.

rate research

Is Faster R-CNN Doing Well for Pedestrian Detection?

69 - Liliang Zhang , Liang Lin , Xiaodan Liang 2016

Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN [1, 2] have shown excellent performance for general object detection, they have limited success for detecting pedestrian, and previous leading pedestrian detectors were in general hybrid methods combining hand-crafted and deep convolutional features. In this paper, we investigate issues involving Faster R-CNN [2] for pedestrian detection. We discover that the Region Proposal Network (RPN) in Faster R-CNN indeed performs well as a stand-alone pedestrian detector, but surprisingly, the downstream classifier degrades the results. We argue that two reasons account for the unsatisfactory accuracy: (i) insufficient resolution of feature maps for handling small instances, and (ii) lack of any bootstrapping strategy for mining hard negative examples. Driven by these observations, we propose a very simple but effective baseline for pedestrian detection, using an RPN followed by boosted forests on shared, high-resolution convolutional feature maps. We comprehensively evaluate this method on several benchmarks (Caltech, INRIA, ETH, and KITTI), presenting competitive accuracy and good speed. Code will be made publicly available.

Computer Vision and Pattern Recognition

Is it Enough to Optimize CNN Architectures on ImageNet?

478 - Lukas Tuggener , Jurgen Schmidhuber , Thilo Stadelmann 2021

An implicit but pervasive hypothesis of modern computer vision research is that convolutional neural network (CNN) architectures that perform better on ImageNet will also perform better on other vision datasets. We challenge this hypothesis through an extensive empirical study for which we train 500 sampled CNN architectures on ImageNet as well as 8 other image classification datasets from a wide array of application domains. The relationship between architecture and performance varies wildly, depending on the datasets. For some of them, the performance correlation with ImageNet is even negative. Clearly, it is not enough to optimize architectures solely for ImageNet when aiming for progress that is relevant for all applications. Therefore, we identify two dataset-specific performance indicators: the cumulative width across layers as well as the total depth of the network. Lastly, we show that the range of dataset variability covered by ImageNet can be significantly extended by adding ImageNet subsets restricted to few classes.

Computer Vision and Pattern Recognition Machine Learning

Data mining when each data point is a network

372 - Karthikeyan Rajendran , Assimakis A. Kattis , Alexander Holiday 2016

We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further incorporate these approaches with equation free techniques, demonstrating how such data mining approaches can enhance scientific computation of network evolution dynamics.

Social and Information Networks Data Analysis Statistics and Probability

Symmetric Orthogonal Tensor Decomposition is Trivial

622 - Tamara G. Kolda 2015

We consider the problem of decomposing a real-valued symmetric tensor as the sum of outer products of real-valued, pairwise orthogonal vectors. Such decompositions do not generally exist, but we show that some symmetric tensor decomposition problems can be converted to orthogonal problems following the whitening procedure proposed by Anandkumar et al. (2012). If an orthogonal decomposition of an $m$-way $n$-dimensional symmetric tensor exists, we propose a novel method to compute it that reduces to an $n times n$ symmetric matrix eigenproblem. We provide numerical results demonstrating the effectiveness of the method.

Numerical Analysis Numerical Analysis

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

262 - Lai Jiang , Mai Xu , Zulin Wang 2017

Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.

Computer Vision and Pattern Recognition