No Arabic abstract
Recent image classification algorithms, by learning deep features from large-scale datasets, have achieved significantly better results comparing to the classic feature-based approaches. However, there are still various challenges of image classifications in practice, such as classifying noisy image or image-set queries and training deep image classification models over the limited-scale dataset. Instead of applying generic deep features, the model-based approaches can be more effective and data-efficient for robust image and image-set classification tasks, as various image priors are exploited for modeling the inter- and intra-set data variations while preventing over-fitting. In this work, we propose a novel Joint Statistical and Spatial Sparse representation, dubbed textit{J3S}, to model the image or image-set data for classification, by reconciling both their local patch structures and global Gaussian distribution mapped into Riemannian manifold. To the best of our knowledge, no work to date utilized both global statistics and local patch structures jointly via joint sparse representation. We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity. The learned J3S models are used for robust image and image-set classification. Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
We propose a novel image set classification technique using linear regression models. Downsampled gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step. We estimate regression models for each test image using the class specific gallery subspaces. Images of the test set are then reconstructed using the regression models. Based on the minimum reconstruction error between the reconstructed and the original images, a weighted voting strategy is used to classify the test set. We performed extensive evaluation on the benchmark UCSD/Honda, CMU Mobo and YouTube Celebrity datasets for face classification, and ETH-80 dataset for object classification. The results demonstrate that by using only a small amount of training data, our technique achieved competitive classification accuracy and superior computational speed compared with the state-of-the-art methods.
Nowadays, deep learning methods, especially the convolutional neural networks (CNNs), have shown impressive performance on extracting abstract and high-level features from the hyperspectral image. However, general training process of CNNs mainly considers the pixel-wise information or the samples correlation to formulate the penalization while ignores the statistical properties especially the spectral variability of each class in the hyperspectral image. These samples-based penalizations would lead to the uncertainty of the training process due to the imbalanced and limited number of training samples. To overcome this problem, this work characterizes each class from the hyperspectral image as a statistical distribution and further develops a novel statistical loss with the distributions, not directly with samples for deep learning. Based on the Fisher discrimination criterion, the loss penalizes the sample variance of each class distribution to decrease the intra-class variance of the training samples. Moreover, an additional diversity-promoting condition is added to enlarge the inter-class variance between different class distributions and this could better discriminate samples from different classes in hyperspectral image. Finally, the statistical estimation form of the statistical loss is developed with the training samples through multi-variant statistical analysis. Experiments over the real-world hyperspectral images show the effectiveness of the developed statistical loss for deep learning.
The inclusion of spatial information into spectral classifiers for fine-resolution hyperspectral imagery has led to significant improvements in terms of classification performance. The task of spectral-spatial hyperspectral image classification has remained challenging because of high intraclass spectrum variability and low interclass spectral variability. This fact has made the extraction of spatial information highly active. In this work, a novel hyperspectral image classification framework using the fusion of dual spatial information is proposed, in which the dual spatial information is built by both exploiting pre-processing feature extraction and post-processing spatial optimization. In the feature extraction stage, an adaptive texture smoothing method is proposed to construct the structural profile (SP), which makes it possible to precisely extract discriminative features from hyperspectral images. The SP extraction method is used here for the first time in the remote sensing community. Then, the extracted SP is fed into a spectral classifier. In the spatial optimization stage, a pixel-level classifier is used to obtain the class probability followed by an extended random walker-based spatial optimization technique. Finally, a decision fusion rule is utilized to fuse the class probabilities obtained by the two different stages. Experiments performed on three data sets from different scenes illustrate that the proposed method can outperform other state-of-the-art classification techniques. In addition, the proposed feature extraction method, i.e., SP, can effectively improve the discrimination between different land covers.
Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrat
In this paper, a novel statistical metric learning is developed for spectral-spatial classification of the hyperspectral image. First, the standard variance of the samples of each class in each batch is used to decrease the intra-class variance within each class. Then, the distances between the means of different classes are used to penalize the inter-class variance of the training samples. Finally, the standard variance between the means of different classes is added as an additional diversity term to repulse different classes from each other. Experiments have conducted over two real-world hyperspectral image datasets and the experimental results have shown the effectiveness of the proposed statistical metric learning.