No Arabic abstract
Single-image-based view generation (SIVG) is important for producing 3D stereoscopic content. Here, handling different spatial resolutions as input and optimizing both reconstruction accuracy and processing speed is desirable. Latest approaches are based on convolutional neural network (CNN), and they generate promising results. However, their use of fully connected layers as well as pre-trained VGG forces a compromise between reconstruction accuracy and processing speed. In addition, this approach is limited to the use of a specific spatial resolution. To remedy these problems, we propose exploiting fully convolutional networks (FCN) for SIVG. We present two FCN architectures for SIVG. The first one is based on combination of an FCN and a view-rendering network called DeepView$_{ren}$. The second one consists of decoupled networks for luminance and chrominance signals, denoted by DeepView$_{dec}$. To train our solutions we present a large dataset of 2M stereoscopic images. Results show that both of our architectures improve accuracy and speed over the state of the art. DeepView$_{ren}$ generates competitive accuracy to the state of the art, however, with the fastest processing speed of all. That is x5 times faster speed and x24 times lower memory consumption compared to the state of the art. DeepView$_{dec}$ has much higher accuracy, but with x2.5 times faster speed and x12 times lower memory consumption. We evaluated our approach with both objective and subjective studies.
We introduce the problem of perpetual view generation - long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which quickly degenerate when presented with large camera motions. Methods for video generation also have limited ability to produce long sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative `emph{render}, emph{refine} and emph{repeat} framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences. We propose a dataset of aerial footage of coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods. Project page at https://infinite-nature.github.io/.
In recent years, single image dehazing models (SIDM) based on atmospheric scattering model (ASM) have achieved remarkable results. However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image. Obviously, the hazy images taken in real world cannot always satisfy this assumption. Such generating modelling mismatch between the real-world images and ASM sets up the upper bound of trained ASM-based SIDM for dehazing. Bearing this in mind, in this study, a new fully non-homogeneous atmospheric scattering model (FNH-ASM) is proposed for well modeling the hazy images under complex conditions where ALF and ASC are pixel dependent. However, FNH-ASM brings difficulty in practical application. In FNH-ASM based SIDM, the estimation bias of parameters at different positions lead to different distortion of dehazing result. Hence, in order to reduce the influence of parameter estimation bias on dehazing results, two new cost sensitive loss functions, beta-Loss and D-Loss, are innovatively developed for limiting the parameter bias of sensitive positions that have a greater impact on the dehazing result. In the end, based on FNH-ASM, an end-to-end CNN-based dehazing network, FNHD-Net, is developed, which applies beta-Loss and D-Loss. Experimental results demonstrate the effectiveness and superiority of our proposed FNHD-Net for dehazing on both synthetic and real-world images. And the performance improvement of our method increases more obviously in dense and heterogeneous haze scenes.
Undersampling the k-space data is widely adopted for acceleration of Magnetic Resonance Imaging (MRI). Current deep learning based approaches for supervised learning of MRI image reconstruction employ real-valued operations and representations by treating complex valued k-space/spatial-space as real values. In this paper, we propose complex dense fully convolutional neural network ($mathbb{C}$DFNet) for learning to de-alias the reconstruction artifacts within undersampled MRI images. We fashioned a densely-connected fully convolutional block tailored for complex-valued inputs by introducing dedicated layers such as complex convolution, batch normalization, non-linearities etc. $mathbb{C}$DFNet leverages the inherently complex-valued nature of input k-space and learns richer representations. We demonstrate improved perceptual quality and recovery of anatomical structures through $mathbb{C}$DFNet in contrast to its real-valued counterparts.
Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their outstanding ability to be used in different sectors and serve in difficult and dangerous areas. Moreover, the advancements in computer vision and artificial intelligence have increased the use of UAVs in various applications and solutions, such as forest fires detection and borders monitoring. However, using deep neural networks (DNNs) with UAVs introduces several challenges of processing deeper networks and complex models, which restricts their on-board computation. In this work, we present a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board and finds the minimum decision-making latency. We formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions. The formulated optimization solution is an NP-hard problem. Hence it is not adequate for online resource allocation. Therefore, we introduce an online heuristic solution, namely DistInference, to find the layers placement strategy that gives the best latency among the available UAVs. The proposed approach is general enough to be used for different low decision-latency applications as well as for all CNN types organized into the pipeline of layers (e.g., VGG) or based on residual blocks (e.g., ResNet).
Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models.