أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Chengjiang Long

Luminance Attentive Networks for HDR Image and Panorama Reconstruction

108 - Hanning Yu , Wentao Liu , Chengjiang Long 2021

It is very challenging to reconstruct a high dynamic range (HDR) from a low dynamic range (LDR) image as an ill-posed problem. This paper proposes a luminance attentive network named LANet for HDR reconstruction from a single LDR image. Our method is based on two fundamental observations: (1) HDR images stored in relative luminance are scale-invariant, which means the HDR images will hold the same information when multiplied by any positive real number. Based on this observation, we propose a novel normalization method called HDR calibration for HDR images stored in relative luminance, calibrating HDR images into a similar luminance scale according to the LDR images. (2) The main difference between HDR images and LDR images is in under-/over-exposed areas, especially those highlighted. Following this observation, we propose a luminance attention module with a two-stream structure for LANet to pay more attention to the under-/over-exposed areas. In addition, we propose an extended network called panoLANet for HDR panorama reconstruction from an LDR panorama and build a dualnet structure for panoLANet to solve the distortion problem caused by the equirectangular panorama. Extensive experiments show that our proposed approach LANet can reconstruct visually convincing HDR images and demonstrate its superiority over state-of-the-art approaches in terms of all metrics in inverse tone mapping. The image-based lighting application with our proposed panoLANet also demonstrates that our method can simulate natural scene lighting using only LDR panorama. Our source code is available at https://github.com/LWT3437/LANet.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

CANet: A Context-Aware Network for Shadow Removal

79 - Zipei Chen , Chengjiang Long , Ling Zhang 2021

In this paper, we propose a novel two-stage context-aware network named CANet for shadow removal, in which the contextual information from non-shadow regions is transferred to shadow regions at the embedded feature spaces. At Stage-I, we propose a co ntextual patch matching (CPM) module to generate a set of potential matching pairs of shadow and non-shadow patches. Combined with the potential contextual relationships between shadow and non-shadow regions, our well-designed contextual feature transfer (CFT) mechanism can transfer contextual information from non-shadow to shadow regions at different scales. With the reconstructed feature maps, we remove shadows at L and A/B channels separately. At Stage-II, we use an encoder-decoder to refine current results and generate the final shadow removal results. We evaluate our proposed CANet on two benchmark datasets and some real-world shadow images with complex scenes. Extensive experimental results strongly demonstrate the efficacy of our proposed CANet and exhibit superior performance to state-of-the-arts.

الرؤية الحاسوبية وتمييز الأنماط

DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer

223 - Wenju Xu , Chengjiang Long , Ruisheng Wang 2021

The paper proposes a Dynamic ResBlock Generative Adversarial Network (DRB-GAN) for artistic style transfer. The style code is modeled as the shared parameters for Dynamic ResBlocks connecting both the style encoding network and the style transfer net work. In the style encoding network, a style class-aware attention mechanism is used to attend the style feature representation for generating the style codes. In the style transfer network, multiple Dynamic ResBlocks are designed to integrate the style code and the extracted CNN semantic feature and then feed into the spatial window Layer-Instance Normalization (SW-LIN) decoder, which enables high-quality synthetic images with artistic style transfer. Moreover, the style collection conditional discriminator is designed to equip our DRB-GAN model with abilities for both arbitrary style transfer and collection style transfer during the training stage. No matter for arbitrary style transfer or collection style transfer, extensive experiments strongly demonstrate that our proposed DRB-GAN outperforms state-of-the-art methods and exhibits its superior performance in terms of visual quality and efficiency. Our source code is available at color{magenta}{url{https://github.com/xuwenju123/DRB-GAN}}.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning

80 - Xinzhi Dong , Chengjiang Long , Wenju Xu 2021

Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image, without exploring the contextual correlation existed among contextual image. In this paper, we propose Dual Graph Convoluti onal Networks (Dual-GCN) with transformer and curriculum learning for image captioning. In particular, we not only use an object-level GCN to capture the object to object spatial relation within a single image, but also adopt an image-level GCN to capture the feature information provided by similar images. With the well-designed Dual-GCN, we can make the linguistic transformer better understand the relationship between different objects in a single image and make full use of similar images as auxiliary information to generate a reasonable caption description for a single image. Meanwhile, with a cross-review strategy introduced to determine difficulty levels, we adopt curriculum learning as the training strategy to increase the robustness and generalization of our proposed model. We conduct extensive experiments on the large-scale MS COCO dataset, and the experimental results powerfully demonstrate that our proposed method outperforms recent state-of-the-art approaches. It achieves a BLEU-1 score of 82.2 and a BLEU-2 score of 67.6. Our source code is available at {em color{magenta}{url{https://github.com/Unbear430/DGCN-for-image-captioning}}}.

الرؤية الحاسوبية وتمييز الأنماط

Scene Inference for Object Illumination Editing

92 - Zhongyun Bao , Chengjiang Long , Gang Fu 2021

The seamless illumination integration between a foreground object and a background scene is an important but challenging task in computer vision and augmented reality community. However, to our knowledge, there is no publicly available high-quality d ataset that meets the illumination seamless integration task, which greatly hinders the development of this research direction. To this end, we apply a physically-based rendering method to create a large-scale, high-quality dataset, named IH dataset, which provides rich illumination information for seamless illumination integration task. In addition, we propose a deep learning-based SI-GAN method, a multi-task collaborative network, which makes full use of the multi-scale attention mechanism and adversarial learning strategy to directly infer mapping relationship between the inserted foreground object and corresponding background environment, and edit object illumination according to the proposed illumination exchange mechanism in parallel network. By this means, we can achieve the seamless illumination integration without explicit estimation of 3D geometric information. Comprehensive experiments on both our dataset and real-world images collected from the Internet show that our proposed SI-GAN provides a practical and effective solution for image-based object illumination editing, and validate the superiority of our method against state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

CRD-CGAN: Category-Consistent and Relativistic Constraints for Diverse Text-to-Image Generation

77 - Tao Hu , Chengjiang Long , Chunxia Xiao 2021

Generating photo-realistic images from a text description is a challenging problem in computer vision. Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks (GANs). In this paper, we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images. Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to synthesize $K$ photo-realistic images simultaneously. We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises. Then, we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images, which can improve the performance of basic conditional loss. Finally, we introduce a category-consistent loss to alleviate the over-category issues between K synthetic images. We evaluate our approach using the Birds-200-2011, Oxford-102 flower and MSCOCO 2014 datasets, and the extensive experiments demonstrate superiority of the proposed method in comparison with state-of-the-art methods in terms of photorealistic and diversity of the generated synthetic images.

الرؤية الحاسوبية وتمييز الأنماط

A Two-Stage Attentive Network for Single Image Super-Resolution

368 - Jiqing Zhang , Chengjiang Long , Yuxin Wang 2021

Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and contribute remarkable progress. However, most of the existing CNNs-based SISR methods do not adequately explore contextual infor mation in the feature extraction stage and pay little attention to the final high-resolution (HR) image reconstruction step, hence hindering the desired SR performance. To address the above two issues, in this paper, we propose a two-stage attentive network (TSAN) for accurate SISR in a coarse-to-fine manner. Specifically, we design a novel multi-context attentive block (MCAB) to make the network focus on more informative contextual features. Moreover, we present an essential refined attention block (RAB) which could explore useful cues in HR space for reconstructing fine-detailed HR image. Extensive evaluations on four benchmark datasets demonstrate the efficacy of our proposed TSAN in terms of quantitative metrics and visual effects. Code is available at https://github.com/Jee-King/TSAN.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

L2GSCI: Local to Global Seam Cutting and Integrating for Accurate Face Contour Extraction

82 - Yongwei Nie , Xu Cao , Chengjiang Long 2017

Current face alignment algorithms can robustly find a set of landmarks along face contour. However, the landmarks are sparse and lack curve details, especially in chin and cheek areas where a lot of concave-convex bending information exists. In this paper, we propose a local to global seam cutting and integrating algorithm (L2GSCI) to extract continuous and accurate face contour. Our method works in three steps with the help of a rough initial curve. First, we sample small and overlapped squares along the initial curve. Second, the seam cutting part of L2GSCI extracts a local seam in each square region. Finally, the seam integrating part of L2GSCI connects all the redundant seams together to form a continuous and complete face curve. Overall, the proposed method is much more straightforward than existing face alignment algorithms, but can achieve pixel-level continuous face curves rather than discrete and sparse landmarks. Moreover, experiments on two face benchmark datasets (i.e., LFPW and HELEN) show that our method can precisely reveal concave-convex bending details of face contours, which has significantly improved the performance when compared with the state-ofthe- art face alignment approaches.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد