No Arabic abstract
In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and rotation-invariant framework for processing such datasets, we introduce a graph Laplacian (GL)-like operator over the dataset, termed ${textit{steerable graph Laplacian}}$. Essentially, the steerable GL extends the standard GL by accounting for all (infinitely-many) planar rotations of all images. As it turns out, similarly to the standard GL, a properly normalized steerable GL converges to the Laplace-Beltrami operator on the low-dimensional manifold. However, the steerable GL admits an improved convergence rate compared to the GL, where the improved convergence behaves as if the intrinsic dimension of the underlying manifold is lower by one. Moreover, it is shown that the steerable GL admits eigenfunctions of the form of Fourier modes (along the orbits of the images rotations) multiplied by eigenvectors of certain matrices, which can be computed efficiently by the FFT. For image datasets corrupted by noise, we employ a subset of these eigenfunctions to filter the dataset via a Fourier-like filtering scheme, essentially using all images and their rotations simultaneously. We demonstrate our filtering framework by de-noising simulated single-particle cryo-EM image datasets.
Retinal fundus photographs have been used in the diagnosis of many ocular diseases such as glaucoma, pathological myopia, age-related macular degeneration and diabetic retinopathy. With the development of computer science, computer aided diagnosis has been developed to process and analyse the retinal images automatically. One of the challenges in the analysis is that the quality of the retinal image is often degraded. For example, a cataract in human lens will attenuate the retinal image, just as a cloudy camera lens which reduces the quality of a photograph. It often obscures the details in the retinal images and posts challenges in retinal image processing and analysing tasks. In this paper, we approximate the degradation of the retinal images as a combination of human-lens attenuation and scattering. A novel structure-preserving guided retinal image filtering (SGRIF) is then proposed to restore images based on the attenuation and scattering model. The proposed SGRIF consists of a step of global structure transferring and a step of global edge-preserving smoothing. Our results show that the proposed SGRIF method is able to improve the contrast of retinal images, measured by histogram flatness measure, histogram spread and variability of local luminosity. In addition, we further explored the benefits of SGRIF for subsequent retinal image processing and analysing tasks. In the two applications of deep learning based optic cup segmentation and sparse learning based cup-to-disc ratio (CDR) computation, our results show that we are able to achieve more accurate optic cup segmentation and CDR measurements from images processed by SGRIF.
Invertible image representation methods (transforms) are routinely employed as low-level image processing operations based on which feature extraction and recognition algorithms are developed. Most transforms in current use (e.g. Fourier, Wavelet, etc.) are linear transforms, and, by themselves, are unable to substantially simplify the representation of image classes for classification. Here we describe a nonlinear, invertible, low-level image processing transform based on combining the well known Radon transform for image data, and the 1D Cumulative Distribution Transform proposed earlier. We describe a few of the properties of this new transform, and with both theoretical and experimental results show that it can often render certain problems linearly separable in transform space.
The current practice in land cover/land use change analysis relies heavily on the individually classified maps of the multitemporal data set. Due to varying acquisition conditions (e.g., illumination, sensors, seasonal differences), the classification maps yielded are often inconsistent through time for robust statistical analysis. 3D geometric features have been shown to be stable for assessing differences across the temporal data set. Therefore, in this article we investigate he use of a multitemporal orthophoto and digital surface model derived from satellite data for spatiotemporal classification. Our approach consists of two major steps: generating per-class probability distribution maps using the random-forest classifier with limited training samples, and making spatiotemporal inferences using an iterative 3D spatiotemporal filter operating on per-class probability maps. Our experimental results demonstrate that the proposed methods can consistently improve the individual classification results by 2%-6% and thus can be an important postclassification refinement approach.
Supervised classification and representation learning are two widely used classes of methods to analyze multivariate images. Although complementary, these methods have been scarcely considered jointly in a hierarchical modeling. In this paper, a method coupling these two approaches is designed using a matrix cofactorization formulation. Each task is modeled as a factorization matrix problem and a term relating both coding matrices is then introduced to drive an appropriate coupling. The link can be interpreted as a clustering operation over a low-dimensional representation vectors. The attribution vectors of the clustering are then used as features vectors for the classification task, i.e., the coding vectors of the corresponding factorization problem. A proximal gradient descent algorithm, ensuring convergence to a critical point of the objective function, is then derived to solve the resulting non-convex non-smooth optimization problem. An evaluation of the proposed method is finally conducted both on synthetic and real data in the specific context of hyperspectral image interpretation, unifying two standard analysis techniques, namely unmixing and classification.
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses via a suitable 3D model, such as SMPL for humans. We propose to learn a multi-hypothesis neural network regressor using a best-of-M loss, where each of the M hypotheses is constrained to lie on a manifold of plausible human poses by means of a generative model. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans, and in heavily occlud